TOWARDS UNDERSTANDING URBAN PATTERNS AND STRUCTURES

Intelligent urban design is a set of principles for desirable future urban structures. Existing urban structures can be analysed using remotely sensed images. In order to foster this analysis both in speed and objectivity automation is proposed in this work. Automatic Gestalt perception is distinguished from automatic knowledge-based analysis. Both will be required. For the Gestalt side an algebraic approach is utilized. This Gestalt algebra operates on a 6-D domain containing position, orientation, frequency, scale and assessment. It defines how to form aggregates from parts. Any Gestalt can be combined with arbitrary others, but good assessments are only achieved, if the parts are mutually in Gestalt-arrangements. There are operations for mirror-symmetry, good continuation in rows and rotational-symmetry. In this paper experiments are made only with mirror-symmetry and row-continuation. Example images of Thimphu, Bhutan and Phoenix, Arizona are obtained by use of Google Earth. The results are to a large degree in accordance with human perceptual grouping. Some illusory groupings not in accordance with human perception, as well as examples salient to humans which are not instantiated by the system, are discussed as well.


INTRODUCTION
1.1 Urban Structures (Benninger, 2001) lists ten principles of intelligent urban design: balance with nature, balance with tradition, appropriate technology, conviviality, efficiency, human scale, opportunity matrix, regional integration, balanced movement, and institutional integrity.Each is more specifically elaborated by examples on many scales from individual houses to provincesyielding the pattern of an ideal, pleasant urban environment, more a goal than the description of an existing city anywhere in the world.Today, remotely sensed images provide a huge mass of data of any city in the world.Such images are available on any scale between below meter per pixel resolution and kilometres per pixel resolution.In particular virtual globe systems, such as Google Earth, give a comfortable interface and a very rich database.So in principle it should be possible to identify deviations between such ideal and reality, and e.g.pick out specifically good and bad examples, in order to learn from them.Doing so straight-forwardly requires a lot of zooming in and out, looking, thinking, acquiring additional knowledge, and contemplation.Probably, some automation and tools would help.Automation will not only save timeit will also help in making the outcome less subjective.

Gestalt and Knowledge
Pre-attentive, unaware grouping of objects into probably relevant aggregates is referred to as Gestalt perception.Psychologists measured that e.g.mirror symmetry is a very fast and important grouping mechanism for pictorial data in human observers, see (Sassi et al., 2014;Treder, 2010).In this paper we take the view that such grouping processes also contribute much to the understanding of urban patterns from remotely sensed images by human observers.Thus mathematical models and corresponding search procedures should be investigated, if automation of such understanding is the goal.
On the other hand, human understanding of urban patterns from remotely sensed images will also be guided by knowledge.Mostly, this guidance is less fast and can be accompanied by awareness, so that self-inspection or expert interviews may help in the construction of corresponding artificial intelligence systems.Mutual constraints and relations between meaningful objects in a plane can e.g.be captured by syntactic structures.Possible other machine-readable formats include semantic nets, ontologies, etc.Such formats allow automatic inference.

Related Work
Knowledge-based approaches to urban pattern understanding have a long history with a prominent example being (Matsuyama & Hwang, 1990).One other example of such early syntactic understanding systems working from pixel scale to nested urban structure of 1000 pixel scale is (Füger et al. 1992).In this millennium the interest is more in getting more robustness into such approaches by the use of learning and statistical inference (Zhu et al. 2009).Some of our own previous work captured perceptual grouping according to Gestalt principles inside a knowledge-based approachnamely, in production systems (Michaelsen et al., 2010;Michaelsen et al., 2006).It is now one main intention of the contribution at hand to separate the perceptualpreattentivegrouping from theartificial intelligence basedknowledge utilization.

Domain
Following (Michaelsen, 2014) with the components of the domain named position, orientation, frequency, scale, and assessment.We are not in the knowledgebased remote sensing topics here, so we don't call our entities 'building' or 'road' or 'roundabout' or 'city centre'.In avoidance of this the term 'Gestalt' is used.The meaning of position and scale attributes is self-evident.If a Gestalt g has frequency n it means to be self-similar with respect to rotations from S n .The orientation then is the phase of such rotation with respect to some specified direction (e.g.East).Assessment = 0 means 'meaningless', while assessment = 1 means 'almost surely meaningful'.

Operations
Three operations are given for the Gestalt domain in (Michaelsen & Yashina, 2014): A binary mirror operation (2) yielding assessment 1 for g|h if g and h are mutually in mirror symmetry, proximity, and of similar scale, and both have also assessment 1. Otherwise the assessment of g|h will be smaller than 1, but the operation is still defined.Aggregate Gestalten formed by such term all have frequency 2.
An n-ary row-forming operation (3) yielding assessment 1 for ∑g 1 … g n if g 1 … g n form a perfect row, are mutually in proximity and of similar scale, and all have also assessment 1. Again any deviations lead to smaller assessment of the aggregate.Aggregate Gestalten formed by such term all have frequency 2.
An n-ary operation preferring rotational patterns yielding assessment 1 for ∏g 1 … g n if g 1 … g n form a perfect rotational mandala, are mutually in proximity and of similar scale, and all have also assessment 1. Again any deviations lead to smaller assessment of the aggregate.Aggregate Gestalten formed by such term have frequency n.
For the details of the operations and proofs of algebraic closure we refer to (Michaelsen & Yashina, 2014).There is, however, one change: Formulae (4) and (9), or (4) and ( 10) of (Michaelsen, 2014), give the proximity-to-scale component of the assessment functions in a similar manner as (6) of this paper.Such function gives the same value for a ratio r as for 1/r.This turned out being less in accordance with human proximity grouping.We replaced it by where r is again the ratio between position distances and midscale of the part Gestalten at hand.The root of the Euler constant e is added here in order to have maximal value as p =1 for r=1.This function resembles in form the density of a Rayleigh distribution.The tail of it for r>1 has less mass than the original formulae of (Michaelsen & Yashina, 2014).
Unfortunately, Gestalt algebra for the time being gives no operation for 2-D-grid symmetries.Such operation might be appropriate for scenes like the Phoenix images below.

Search
Primitive Gestalten are extracted from the images by use of keypoint detectors such as SIFT, or other appropriate methods such as super-pixel segmentation.Search strategies then build higher order Gestalt-terms using the operations above recursively.Depending on the desired output, often a cluster-step gives the final output, because the best Gestalten are often found in multiple, slightly different variants.

SOME EXPERIMENTS
In context with the principles of urban design according to (Benninger, 2001) often the city of Thimphu in Bhutan is mentioned.For comparison a typical automobile-dominated city was chosen -Phoenix, Arizona.

Image Capture
The images where obtained from interesting structures there using Google Earth virtual globe systemsee Table 1 for the Geo-coordinates and the following Figures for examples.The 3D-features of the virtual-globe system were deactivated, nadir view direction chosen, and the camera-to-ground distance was set mostly to 500m, giving a pixel size of 0.55m on the 1100×1040 images.For comparison also images of larger scale were included as can be seen from Table 1.Logos and Geocoordinates etc. have not been removedthey should not make much difference.

Primitive Extraction
Figure 1.A set of SIFT primitive-Gestalten as obtained from Image #9, grey tones code assessment Standard SIFT in MATLAB implementation was used yielding key-points and descriptors in numbers indicated in column 2 of Table 1.The key-points were used as primitive-Gestalten.Some example is displayed in Figure 1, showing the 12619 instances obtained from picture #9 (also used in Figure 3).The acceptance of a SIFT-key-point is based on a threshold on eigenvalue-properties of the outer product of the smoothed gradient.Thus an assessment can be given for every instance, giving value zero (indicated as white colour in the Figure ) if the threshold was just meat, and one for the maximal instance in that picture.
As the abbreviation SIFT already indicates, this extractor is strictly scale invarianta property which should be softened for remote sensing data.Recall that for objects such as buildings a rough scale is known.Therefore, a preference for primitives of a certain size sc pref was introduced by re-assessing using: So the new assessment as new equals the old as old only if the scale of the primitive Gestalt sc equals the preferred value, and large deviations are punished.For the experiments reported here sc pref =50Pixel was set.In order not to overload the search only the 1200 best assessed Primitives are kept.

Search
The 128-dimensional SIFT-descriptors were also used to improve the assessments of the depth-one Gestalten, i.e. |-Gestalten from primitives and ∑-Gestalten from primitives.This follows (Michaelsen, 2014).For the |-Gestalten all pairs of input Gestalten are evaluated.For the ∑-Gestalten the same greedy search strategy was used as in (Michaelsen, 2014), which starts also from pairs, but then concentrates on the best continuation.
Only rows with more than 2 members were kept.For this paper no rotary ∏-Gestalten were investigated.The number of higher order Gestalten of each kind is limited to 500, again in order not to overload the current MATLAB-implementation.In this paper the search-depth was limited to 2. Figure 2c shows, that starting from primitives of our preferred scale sc pref depth-2 Gestalten already fill large portions of the image.

Results
In the following list the results are discussed qualitatively.
There is not enough room to present all results in a pictorial way, so this is done only for selected examples.Assessment statistics are given for all pictures in the tables 1, 2, and 3. #1: centre of Thimphu, rather irregular; neither row operation nor mirror operation yield results in accordance with human perceptual grouping, Table 1 shows low assessments for the rows, this sets the clutter level, about .85 is appropriate; Table 2 gives almost the same for mirror grouping, however, here two of the best instances are located on major Buddhist shrines, and those are well in accordance with saliency to humans; This contribution has been peer-reviewed.doi:10.5194/isprsarchives-XL-3-W2-135-2015#2: containing newly build residential house rows; most of the rows of similar new residential houses are found in accordance with human perceptual grouping, long ∑P…P rows (up to eleven members) are instantiated on the salient sow-tooth roofed workshop, also many of the neat symmetries of the houses are instantiated as P|P in good accordance with human grouping, and with assessment higher than .85,some row-Gestalten above .85are found along major straight contours along a road and a sports-field not in accordance with human perception, Table 3 shows no mirror-of-row Gestalt above .95on this picturethe best one is actually on the overlaid logos and text; #3: larger symmetric buildingsaccording to Google a "royal institute of managing"; one of the complexes is instantiated by a large cluster of P|P-Gestalten including the best one, but even this one has assessment far below .85, the other salient symmetric building is missed completely, row grouping fails to instantiate the salient triple row of buildings to the west of the institute, best rows are on the logos overlaid to the image, Table 3 shows also low assessments for the depth-2 Gestalten, all in all this example image yields almost complete failure of the method probably due to low contrast; #4: again salient newly-built residential house rows; in accordance with human perceptions these are instantiated mostly, some rows are oblique and thus not in accordance with human grouping, some illusory row Gestalten are instantiated (e.g. on the logos or on a long straight contour at the bank of a river), and assessments are over clutter assessment but not very far; though the buildings are very symmetric automatic mirror grouping fails on this image, best |-Gestalten are far below threshold and illusory (on the overlaid script), probably the lighting and shadow casting breaks the symmetries; the best level 2 symmetry of rows Gestalt is just above .95,but it is located as well on the overlaid script; #5: very salient group of residential buildings; some results on this picture are given in Figure 2 displaying primitives in green, rows in red, and mirrors in blue; Table 1 already indicates that highly assessed ∑P…P -Gestalten are instantiated here, and these are in high accordance with human perception; on the other hand P|P-Gestalten do not reach >.85 assessments, and in fact the individual buildings are not very symmetric; however, in this case, higher order grouping leads to a satisfying result, as can be seen in Figure 2c, where the only mirror-of-rows trespassing .95threshold is displayed; #6: overview over almost all of Thimphu; human perceptual grouping gives here only a large irregular cluster stretching along some major roads (or valleys), the tables indicate that nothing above clutter-level is grouped by Gestalt algebraic operations here, since all maximal assessments are far below clutter level this example can be regarded as a good 'true negative'; #7: residential area in Phoenix organized in rectangular blocks of about 198m×102m (where all of that city is clearly organized in square mile tilesi.e.larger than the images at hand), the individual, probably single-family homes are placed in rows but mutually quite dissimilar in appearance, to human perception the street-grid is most salient, Gestalt algebra fails to instantiate anything like that with sufficient assessment, maximal assessments all fall far below the clutter thresholds, yet among the best assessed row-and mirror-Gestalten most are in some accordance with human vision; #8: to the east and south this picture shows major motorways crossing, the rest is mostly residential area made up of singlefamily homes arranged in a square also salient to human vision, the system yields below clutter-level assessments for mirrorand second-level-Gestalten, Table 1 shows above level .85∑-Gestalten in this image, about half of those are in accordance with human grouping, the others are arranged along long contours given by the motorways, some are also obliquely oriented (where human perception will prefer either horizontal or vertical rows on this picture); #9: this picture is displayed in Figure 3a, most salient to humans is a triple row of large administration buildings in the north western part, this one is also found by the Gestalt algebra search as maximally assessed ∑-Gestalt as can be seen on Figure 3b), threshold .80 was used in the level-1 Figures in order to show also false positives and what would still be missing even at this threshold, it can be seen that false rows appear e.g. on parking grounds and on salient contours, Figure 3c) exhibits that in picture #9 mirror grouping concentrates on the building complex in the south of the image which happens to be a prison, indeed this complex is also saliently symmetric to human perception; #10: the large building complex in the south-west of this picture (a medical centre) has neither salient mirror-symmetry nor repetitions in rows, accordingly Gestalt grouping avoids it (although it has high contrast) and concentrates on the residential buildings in the rest of the image, either small rowhouses or multiple-family homes, much of the automatic Gestalt grouping there is at least in partial accordance with human perception; #11: residential area arranged in concentric circles around a shopping centre; this is a fairly new area obviously designed by repeating the same double-family homes along the curved streets, in accordance with human perceptions the Gestalt assessments are very high herein fact top-level in all three Tables, since the rows bend along the curves ∑-grouping follows only about four instances long, then often two successive such level-1 Gestalten are combined into level-2 |-Gestalten, which is the best the system can do in accordance with human perception, for the time being no ∏-Gestalt grouping was tested, for this image this should be the proper operation; #12: is an overview containing the hospital area of image #10 as well as the salient triple complex of #9, the latter is found again as top-level ∑-Gestalt (though the scale is four times larger here), some other high level rows are in accordance with human perception as well, while some are located along contours of a very big motorway crossingnot in accordance with human perception.#13: same scale as #12 but containing the artificial rotational patterns of #11 which are very salient to human perception, in accordance to this the assessments found by the automatic Gestalt algebra system are also very high as can be seen from the tables, some of the top rated level-2 groupings are mirrors of rows repeating on a larger scale the same construction as in #11, but many are also illusory, almost all high rated level-1 mirrors are in accordance with human perception, but many are missing, highly assessed level-1 row Gestalten seem a little arbitrary; #14: again a factor four in scale (8m per pixel), so that almost all Gestalt disappears, this is in accordance with the very low assessments found by the automatic systemas can be seen in the tables, the residual top-rated Gestalten either rest on the overlaid logos and script, or are illusory.

DISCUSSION
The experiments show that many perceptual groups are found by the automatic search using Gestalt algebra in good accordance with human perception.Setting a suitable clutter thresholdas done abovewill give next to those 'true positives' also some 'false positives', i.e. illusory groups, which are not in accordance with human perception.These are often on salient long straight contours, a fault more due to the keypoint detector which obviously gives too highly assessed responses at such locations.So by improving the key-point method this should be avoidable.'False negatives' i.e.Gestalten salient to human observers which are not instantiated above the clutter threshold by the automatic system have also been found.
An example is given by image #3.This image happens to have low contrast, and some of the saliency to human observers results from colour-contrasts.Recall that colours are not used by the system up to now, which gives a possible future add-on that may well contribute in a way similar to the aid of the SIFTdescriptors.
It is also evident e.g. from Figure 2c) that deeper level structures can be automatically found in accordance with human perception, Gestalten of Gestalten.However, as yet, correctly such instantiated hierarchies remain shallow and not very frequent.For the time being, the threshold has to be set much higher for this -.95 as opposed to .85 for the Gestalten of primitives.This is partly due to the use of the SIFT-descriptors modifying the assessments.It is planned for future work also to incorporate such additional assessment components also into the deeper level Gestalten, next to other constraints.But then such nice results as in Figure 2c) might well be lost: recall that here a row of four members is set in mirror symmetry with a row of three members, and that the descriptors must be mirrored, which would lead to bad assessments on that instance also because shadows will not obey the mirror rules.
Shadows are one example where knowledge about the imaging circumstances and the likely nature of objects in the scene may help, and thus the question arises how such knowledge can be used in cooperation or competition with the Gestalt algebra.
Gestalt grouping alone will anyway not give appropriate understanding so far as intelligent urban design is concerned.But the experiment shows that man-made design according to some plan or principle can to some degree be distinguished automatically from rather arbitrary urban structure.Whether the Gestalten then represent a beautiful Buddhist shrine or a symmetric American prison cannot be distinguished, also not whether it is a well-balanced several-families-home according to Benninger's principles or a serious waste of space for a single family needing three cars to get along.
It is necessary to build some recognition on the obtained Gestalten, or to have the Gestalten instantiation working in cooperation with it.E.g. on a smaller scale some vehicle detector will help to identify certain Gestalten as vehicle rows, and thus assign the meaning 'parking lot'.Those detectors may be trained by representative example material or model-based.And of course a road recognition method will help to measure the amount and importance of individual motor traffic present at the urban structure at hand.

Figure 2 .
Figure 2. Thimphu scene #5: a) image as taken by virtual globe system; b) rows of primitive Gestalten ∑P…P surpassing assessment 0.8; c) the best mirror-of-row Gestalt

Figure 3 .
Figure 3. Phoenix scene #9: a) image as taken by virtual globe system; b) rows of primitive Gestalten ∑P…P surpassing assessment 0.8; c) |-Gestalten on the same image

Table 3 .
Table3shows general higher assessments for level-2 Gestaltenin this case mirror-of-row Gestalten; here the clutter level threshold should be set higher, e.g. to .95.Resulting numbers and properties of ∑|∑-Gestalten