AUTOMATIC URBAN 3 D BUILDING RECONSTRUCTION FROM MULTI-RAY PHOTOGRAMMETRY

Over the last 20 years the use of, and demand for, three dimensional (3D) building models has meant there has been a vast amount of research conducted in automating the extraction and reconstruction of these models from airborne sensors. Whilst many different approaches have been suggested, full automation is yet to be achieved and research has suggested that the combination of data from multiple sources is required in order to achieve this. Developments in digital photogrammetry have delivered improvements in spatial resolution whilst higher image overlap to increase the number of pixel correspondents between images, giving the name multi-ray photogrammetry, has improved the resolution and quality of its by-products. In this paper the extraction of roof geometry from multiray photogrammetry will be covered, which underpins 3D building reconstruction. Using orthophotos, roof vertices are extracted using the Canny edge detector. Roof planes are detected from digital surface models (DSM) by extracting information from 2D cross sections and measuring height differences. To eliminate overhanging vegetation, the segmentation of trees is investigated by calculating the characteristics of a point within a local neighbourhood of the photogrammetric point cloud. The results highlight the complementary nature of these information sources, and a methodology for integration and reconstruction of roof geometry is proposed.


INTRODUCTION
The use of and demand for three dimensional (3D) building models has grown significantly over the last twenty years for numerous different uses.The manual reconstruction of buildings from aerial imagery via stereoscopy or from lidar data is a time consuming and laborious task, thus the automation of the extraction and reconstruction of 3D buildings has been a major research focus.Research has exploited both aerial imagery and lidar data for the reconstruction of 3D models at varying levels of detail.One standard for the levels of details is outlined by the Open Geospatial Consortium's CityGML standard which defines five levels of detail, where Level of Detail (LOD) 0 is a DTM (Kolbe, 2009).It is fairly straightforward to automate the procedure for the reconstruction of LOD1 building models, which presumes the roof of all buildings to be flat.However, in order to reconstruct a more faithful roof type and create a LOD2 building model, the process is much harder to automate, and requires the extraction of detailed 3D building geometry.LOD3 and LOD4 are beyond the scope of this research and require extra information such as oblique imagery or terrestrial data to create LOD3 and internal building geometry for LOD4.
Of the many different approaches that have been proposed, currently only semi-automation has been achieved, due in part to insufficient automatic data extraction for rooftop reconstruction (Brenner, 2005).Studies which compare proposed reconstruction methods have suggested that in order to achieve full automation, aerial imagery and lidar data need to be used in synergy in order to utilise their superior positional and height resolutions respectively (Brenner, 2005;Kaartinen and Hyyppa, 2006).However, studies which have combined these two datasets have found that registration offsets between the data can cause errors in the reconstruction phases (Rottensteiner et al., 2004;Awrangjeb et al., 2012).Also the cost of acquiring multiple datasets is expensive and the temporal resolution can vary significantly.
The development of hardware and software for digital photogrammetry over the last 10 years has seen the spatial resolution of the imagery and its by-products increase.The use of digital cameras has allowed the increase in image overlap from 60 forward / 20 lateral with a film camera to 80 forward / 60 lateral due to reduction in material costs (Leberl and Thurgood, 2004).This configuration of images means that an individual pixel can be seen in as many as 15 images, compared to six using the overlap of film images, which is where the name multi-ray photogrammetry comes from.Near infrared, as well as RGB imagery, can be captured at a spatial resolution of 0.1 m and better (Leberl and Thurgood, 2004).Algorithmic developments from computer vision such as semiglobal matching have seen the introduction of pixel to pixel correlation (Hirschmuller, 2005).This has led to highly dense point clouds and associated digital surface models (DSM), rivalling those produced from lidar, being created at the same spatial resolution as that of the imagery (Haala, 2011).The DSM can then be used to produce a true orthophoto which gives sharp boundaries of buildings and high levels of roof detail (Wiechert et al., 2012).This paper addresses the extraction of roof geometry from orthophotos, DSMs and photogrammetric point clouds created from multi-ray photogrammetry.The paper is structured as follows.Section two will discuss previous work on the extraction of roof information for reconstruction of 3D buildings models; section three will present the test site and datasets; section four will outline the methodology used for the extraction of the roof geometry; section five will discuss results and section six will present initial conclusions of the research, before a brief discussion of future work in section seven.

RELATED WORK
The automated reconstruction of 3D building models with correct roof geometry is dependent on the quality of the extracted features.Many different extraction approaches have been investigated.These can be classified as either model-driven or data-driven.Model-driven approaches fit pre-determined roof models to the data but are therefore limited to the number of models in the database, so tend to only reconstruct simple roof types such as flat, ridged or hipped (Tseng and Wang, 2003).On the other hand data-driven approaches tend to be more flexible as they use the extracted data in order to create a roof model (Nex and Remondino, 2012).Whilst different in the approach to construct the final roof model, both methods are dependent on the underlying data and the features extracted from this.
Research has focused on utilising data from aerial sensors including aerial imagery and lidar data because of high accuracy and spatial resolution.It is difficult to reconstruct a reliable 3D model from imagery alone because the roof height cannot be accurately reconstructed (Baillard et al., 1999).On the other hand it is difficult to accurately extract building and roof edges from point clouds due to the point density and DSMs not having sharp boundaries (Huang et al., 2011).Therefore many approaches have integrated both lidar and imagery for this process.Aerial imagery has been used to extract building boundaries as well as ridge and valley lines using point and edge detectors.Wang (2012) presumed roof planes are composed of connected points and lines, thus used a modified Movarec point detector and the Canny edge detector to extract points and lines respectively.The points and lines were then matched between images to reconstruct a 3D model.Awrangjeb et al. (2012) also used the Canny edge detector to extract lines from orthoimagery and classified them using lidar data, the normalized difference vegetation index (NDVI) and entropy images.While many edge detectors have been developed and investigated, a significant body of research has highlighted the strong performance of Canny over other edge detectors, with the ability to detect strong and weak edges (Maini and Aggarwal, 2009).
Approaches using lidar or photogrammetric DSMs have applied planar fitting using algorithms such as the RANSAC algorithm (Schnabel et al., 2007) or least squares planar fitting (Omidalizarandi and Saadatseresgt, 2013).Other approaches have used clustering to segment homogenous regions based upon similarity measures (Dorninger and Pfeifer, 2008).An overview of segmentation of data from point clouds and DSMs can be found in Omidalizarandi and Saadatseresgt (2013).
Whilst surface extraction and planar fitting approaches can accurately detect planes and perform well in the presence of noise, they tend to lead to over and under segmentation.Artefacts in the data and low point density can cause irregular planes and under segmentation (Huang et al., 2011).The use of curvature measures, such as surface normal, within a local neighbourhood tend to produce noisy data and lead to over segmentation (Vosselman and Dijkman, 2001;Rabbani et al., 2006).Therefore post-processing is usually required to rectify incorrect segmentation.
Whilst many approaches have tried to fit planes to extract surfaces, an alternative and under-explored approach is the use of cross sections for segmenting planar features.This is sometimes referred to as scan line segmentation.Research has shown that planes can be extracted by segmenting along cross sections of a surface and then performing region growing.This method was firstly proposed by Jiang and Bunke (1994) who extracted sections across every Y pixel of a range image and then split this into planar segments.These segments were created by fitting a line between the end points of a cross section and splitting the line where the distance from a point on the cross section to the chord of the line is greater than a predetermined threshold.Region growing is then undertaken on the lines, instead of individual pixels, to compute planes.This is computationally less expensive compared to region growing of pixels (Jiang and Bunke, 1994).Haala and Brenner (1997) adopted the scan line approach for the segmentation of a lidar DSM, finding that while it correctly segmented planes, it did not extract the position of breaklines very accurately.However, the DSM used had a grid spacing of 0.5 m, so this was likely to have been a factor.The scan line approach has since been developed for the successful classification of lidar data into ground, building and vegetation using height and distance thresholds (Sithole and Vosselman, 2005).
In order to aid the extraction of roof information additional data can also be integrated where available.To limit the search area for extraction of roof geometry, many studies have incorporated topographic mapping data, which is readily available in many countries in 2D digital map format or can easily be derived (Flamanc et al., 2003).Vector mapping can be used to determine the outer boundary of the roof and limit the search area for extraction.The complexity of the reconstruction can be reduced using these 2D maps as well as the improving the knowledge and reliability of reconstruction (Flamanc et al., 2003;Suveg and Vosselman, 2004) .
The segmentation of trees and buildings has also been another major research topic.Trees which are in close proximity of buildings can cause erroneous data to be extracted and thus hinder the accurate reconstruction of buildings.Many approaches have used near infrared imagery to compute the normalized difference vegetation index (NDVI) to classify vegetation (Awrangjeb et al., 2012).However, in the absence of near infrared imagery or if the imagery is captured during autumn or winter then an alternative must be found.Other approaches have used image colour and height variations to classify vegetation but results are usually incomplete (Nex and Remondino, 2012).In the classification of lidar data, terrain roughness, calculated using the second derivative of the DSM, was used to classify trees (Rottensteiner and Briese, 2002).Filin (2002) used surface characteristics to classify regions in lidar data and determined that vegetation should have a rapidly varying slope where as a smooth surface, such as roof plane, should have a locally constant or fixed plane.
As indicated from this literature review, building reconstruction is a challenging and complex task, which receives continuing research attention.Many factors must be considered, and no one approach or data source is capable of delivering an optimum solution.This research will integrate information from multiple sources, but based on a single photogrammetric data collection in order to present an efficient solution for detailed 3D building reconstruction, with focus on rooftop geometry.

TEST DATA
The data supporting this research was captured for an area of the city of Newcastle upon Tyne, United Kingdom in November 2010 using a Vexcel UltraCam XP camera flown with 80% forward overlap, 60% sidelap, and producing a ground sample distance of 0.1 m.Microsoft UltraMap software was used to derive a photogrammetric point cloud, DSM and true orthophoto at the same spatial resolution as the original imagery.
The imagery contains the city centre of Newcastle upon Tyne as well as surrounding large industrial buildings and residential clusters.There are thus a large range of building and roof types in the dataset.An example of the test site extracted from the orthophoto can be seen in Figure 1; OS MasterMap building topography, available from Ordnance Survey (OS), Great Britain's national mapping agency, was utilised in order to extract buildings.This data is produced through manual digitisation from ground and aerial surveys at a 1:1250 scale, and an accuracy of 1 m within urban areas (Ordnance Survey, 2014).A polygon defines the outline of the building at ground level, so does not take into consideration any roof overhang.
The roof boundary as well as ridge and valley lines were manually digitised to create a reference dataset.

METHODOLOGY
Orthoimagery, DSMs and photogrammetric point clouds are used to extract roof geometry.The 2D positions of the roof edge, valley and ridge lines are detected from the true orthophoto.The methodology can thus be split into the following two sections with the workflow depicted in Figure 2; 1. Extraction of 2D roof features 2. Extraction of 3D roof features The following methodology was tested on a series of roof types including flat, ridged, hipped and more complex structures.The roofs were extracted from the orthophotos, DSMs and photogrammetric point cloud using the OS MasterMap data.However, as the OS MasterMap data does not always take into account the overhang of the roof, a buffer of 1 m was applied to these polygons in order to ensure that the complete roof region was selected for processing.For the extraction of 2D features, nine buildings with different roof types including flat, ridged, hipped and roof with more complex structures, were extracted from the orthophotos.The Canny edge detector was then used to detect roof edges, ridges and valleys.Manually derived roof edges, ridge and valley lines were used as referenced data for statistical analysis.The Euclidean distance measure was used to find if an edge pixel was detected by Canny up to one pixel away from the reference data.
With only 2D information being extracted from the orthophotos, the DSM is used in complement to derive the necessary 3D information.The same buildings were used for the extraction of 3D features.The DSM was used to extract roof planes by measuring height differences along X and Y profiles parallel to major roof edges via scan line segmentation.As not all buildings are oriented with roof edges in north-south and east-west directions, the orientation of the building was calculated using the OS MasterMap building topography data.The orientation was determined by calculating the dominant angle of the building polygon, with respect to north.This orientation angle was then used to rotate the extracted building.The rotation is required in order to construct X and Y profiles in a north-south, east-west direction and parallel to major roof edges.Cross sections were generated to intersect every pixel, and thus sample the DSM at 0.1 m in both X and Y.The height values from the DSM are used to calculate the numerical gradient between neighbouring pixels along a cross section.The numerical gradient is calculated using Error!Reference source not found., which for each pixel takes the height value at the next pixel (A i+1 ) and subtracts the height value from the previous pixel (A i-1 ).This value is then divided by 2 to give a height difference.

∑
(1) The sign of the height difference was then used to determine positive and negative slopes.In order to determine flat planes as well as positive and negative planes, the change in sign of the slope was used to determine a breakpoint.A breakpoint was created at a point where the sign of the slope changes, thus splitting the line segment.The height difference between breakpoints was measured and a threshold T h = 0.5 m was applied.If the height difference was between T h and -T h then the section was classified as flat, if the height difference was greater than T h then the section was a positive slope and if the height difference was smaller than -T h then the section was classified as a negative slope.An example of the cross section of an M shaped roof with a flat section can be seen in Figure 3.
Once the planes have been classified the image is then rotated back to its original alignment using the the orientation angle and placed back into its geographic location using the tiff world file of the original extraction.The extraction of 2D and 3D roof geometry can be affected by neighbouring non-roof features.Trees that are in close proximity to the building or overhang the roof surface influence the classification of the slope of the roof and also obstruct the detection of 2D features by edge detection.Whilst the extraction of buildings using the buffered OS MasterMap data removes most of the neighbouring trees, small parts that overhang the roof are still included in this extraction.Therefore the segmentation of trees and buildings was undertaken.For the segmentation of trees and buildings, 12 buildings, with different roof structures similar to those extracted before, and 12 trees, of varying size, shape and type, were manually extracted to measure curvature and roughness.The work of Filin (2002) and Rottensteiner and Briese (2002) shows that trees can be successfully classified based on slope and roughness.Therefore, in order to segment trees from buildings, local characteristics of the photogrammetric point cloud are calculated.Curvature, defined as fitting a quadratic curve to the points and returning the mean Gaussian curvature at a point was calculated.Surface roughness, defined as the distance of a point to a least-squares planar fit, was also calculated.Both these measures were calculated using different neighbourhood sizes.Neighbourhoods from 0.5 m to 4.5 m in increments of 1 m were used to determine an optimum size for calculating these statistics, taking into account the processing time and the difference in values for trees and buildings.A maximum of 4.5 m was used to coincide with the size of the smallest feature that would reasonably be expected to be detected.The value associated with an individual point is then used to compute a median value for curvature and roughness for each investigated building type.The calculation of roughness and curvature was carried out in CloudCompare version 2.5.4.1 which is an open-source software package for point cloud processing (CloudCompare, 2014).

Extraction of 2D Roof Features
Euclidean distance evaluation was used to statistically analyse the edge detection results.A pixel is successfully detected if the pixel coincides to within one pixel of reference data.The percentage of edge pixels detected for the nine test sites can be seen in Table 1, which shows approximately 77% of edges were correctly detected.For nearly all test sites, edges were extracted corresponding to over 70% of manually digitized reference data, with only two test sites showing poorer performance.One test site, for which less than 50% of edges were extracted, included the edge of the building but only part of the ridge line.This is believed to be due to the poor contrast between neighbouring roof planes.The edge detection image in Figure 4 shows that whilst the percentage of roof edges detected may be approximately 77%, there are many edges extracted that do not refer to roof features (false positives).This is due to additional texture in the image as seen in Figure 4a and shadow as seen in Figure 4b.

Extraction of 3D Roof Features
By generating X and Y profiles, roof planes can be determined by using a height difference threshold of T h = 0.5 m.A visual analysis of the results showed that roof planes have been classified successfully, as shown in Figure 5. Small sections referring to chimneys (Figure 5a) and dormer windows (Figure 5b) have also been extracted, which are usually not segmented in planar fitting procedures because they are smaller than the point density or are not considered planar (Sampath and Shan, 2010).The sloped planes have been correctly segmented from the Y profiles of Figure 5a (right) but there are also small sections, predominately at the ridge lines that have been classified as flat.This again is due to noise in the DSM.This is also evident in Figure 5b where it can be seen the ridge line in the Y profiles have been partly classified as flat.Figure 5b also shows that the height difference threshold has removed the corner sections of non-rectangular planes, which occur on hipped roofs and with dormer windows.The small distance between these ridge points means that the height difference is smaller than the threshold so the plane is slightly undersegmented.This is evident in Figure 5b using the X profiles where the corner sections of the small hipped planes have been classified as flat.However, the X and Y complement one another and are necessary in order to classify a roof.While the M shaped roof (Figure 5a) can be classified to a high degree of success using cross sections in one direction, other roofs required X and Y profiles for classification.Sloped planes in the X direction of a hipped roof are classified as flat when measuring along Y, as can be seen in Figure 5b.The calculation of height differences between pixels along X and Y profiles and then classifying them into positive, negative and flat sloped planes is computational fast.The whole process takes a matter of seconds. a.
b. Figure 5. X (left) and Y (right) profiles of a. ridged roof with a flat section b. a hipped roof with dormer windows segmented into positive slopes (yellow), negative slopes (blue) and flat (red).

Segmentation of Trees and Buildings
When comparing trees to buildings results show that regardless of neighbourhood size, for a building with a flat roof or a single ridged roof, trees have a higher curvature and roughness value, as would be expected.However, as the complexity of the roof increases (M-Shaped, Hipped and Complex) the curvature and roughness values become similar to that of trees.

As detailed in
Test Site/ Neighbourhood size 0.5m 1.5m 2.5m 3.5m 4.5m Flat 0.00 0.00 0.00 0.00 0.00 Flat 2 0.00 0.00 0.00 0.00 0.00 Flat 3 0.00 0.00 0.00 0.00 0.00 Flat 4 0.00 0.00 0.00 0.00 0.00 Multiple Flat 0.00 0.00 0.00 0.00 0.00 Ridged 0.00 0.00 0.00 0.00 0.00 Ridged 2 0.00 0.00 0.00 0.00 0.00 Ridged 3 0.00 0.00 0.00 0.00 0.00 M-Shaped 0.00 0.00 0.01 0.01 0.01 Cross Gable 0.00 0.00 0.00 0.00 0.00 Hipped 0.00 0.01 0.01 0.02 0.02 Table 2 there is little difference in curvature values between buildings and trees at all test sites.The general trend for trees is as the neighbourhood size increases the curvature value increase.However, this does not hold for buildings with simple roof structures, such as flat and ridged, having a median curvature of 0 m for all neighbourhood sizes.Whilst the median curvature of more complex buildings (Hipped and Complex) increases with neighbourhood size, the values are very similar to those of some trees.This lack of distinction has shown that curvature cannot not be used to segment trees and buildings.
The results of the roughness tests, as presented in Table 3, exhibit a greater difference between trees and buildings at various neighbourhood sizes.For all test sites median roughness increases as the neighbourhood increases with simpler roof types having a small median roughness compared to more complex roofs.When comparing trees and buildings, trees have a larger median roughness than buildings.With a neighbourhood of 0.5 m it can be seen that complex roofs have the same median roughness as some trees but this difference increases as the neighbourhood size increases.However, it should be noted that the processing time increases as the neighbourhood size increases.
By taking into account the processing time and the difference between trees and buildings, an optimal neighbourhood of 2.5 m was chosen for roughness to threshold the data.At this neighbourhood size, the largest median roughness of buildings is 0.14 m whilst the smallest median roughness of trees is 0.16 m, as shown in Table 3.By using T r = 0.16 m as a threshold, the results of the segmentation are illustrated in Figure 6.It can visually be seen from Figure 6a that the tree has been classified correctly in the test site.However this threshold level has also extracted the ridge lines, the edge of the roof and the edge where two neighbouring planes meet.Figure 6b shows similar results, however, this time the tree classification is very patchy with only parts of the tree classified as being above the threshold T r .Again the ridge line and the edge of the roof have been extracted within this threshold level.Whilst this indicates the value of roughness for segmenting trees, more information must be incorporated to mitigate the inclusion of building features in the segmentation.It is possible that the 2D edge detection results from Section 5.1 may prove useful in this regard, and this will be explored further.Furthermore, a wealth of information could potentially be derived from multi-ray photogrammetrye.g.radiometric properties and absolute height difference, and these could be incorporated to aid the classification and segmentation.Figure 6.Buildings with neighbouring trees showing areas above the threshold (red) and below the threshold (grey)

CONCLUSION
This paper has shown that a range of 2D and 3D information can be extracted from orthoimagery, DSMs and photogrammetric point clouds created from multi-ray photogrammetry.The statistical analysis shows on average 77% of edges are successfully extracted using the Canny edge detector.By taking X and Y profiles of a DSM to measure the height difference between pixels, roof planes and small features can be extracted.However these individual approaches cannot be used in isolation for the reconstruction of 3D roofs.The edge detection shows a series of non-roof edges being extracted and the roof plane segmentation shows incorrect classification of break lines.However, by integrating the two datasets, as shown in Figure 7, it can be seen that there is some coherence.Break lines of planes have been detected as edges from the orthophoto and surplus detected edges occur on the plane.Therefore edges that occur on a plane could be removed and those edges that are left could be used to help aid the planar detection of roofs.Whilst the measure of curvature was unable to segment trees and buildings, local roughness showed some promise in the segmentation.However, by using the median roughness as a threshold it was demonstrated that whilst most of the trees can be classified with this threshold, important parts of the roof structure are also segmented along with the tree.Thus, whilst roughness is clearly of value in classification of trees, more information will be required to refine this and correctly segment trees from buildings.

FUTURE WORK
While this paper has shown that roof geometry can be extracted using edge detectors and roof cross sections both these methods have limitations that need to be addressed.The edge detection has shown high rates of success for the extraction of roof edges but also includes large amounts of noise.Morphological operators, including erosion and dilation, will be included to try and remove this noise.The current segmentation successfully classifies flat sections as well as positive and negatively sloped sections but does not give the accurate location of ridge lines, with some ridge lines being classified as flat sections.Therefore the threshold of height differences between end points of a line segment for the classification of flat, positive and negative sloped planes will be investigated to try and reduce this over segmentation.Another limitation of this method is by only measuring height differences, it is not possible to completely reconstruct all roof types.For example, measuring the slope of a Mansard roof, which has two positive and negative roof planes at different angular inclinations, would segment these two planes as one using only slope and classify the roof as a ridged roof.By measuring the angular change in slope two positively sloped roof planes could be separated.Thus the scan line approach will be adapted to include the measure of angular change.This could be used in synergy with the height threshold to improve the classification of roof planes and hopefully improve the classification of breaklines.
However, one strategy to be applied for both the planar detection and edge detection will be to use the strengths of each dataset to remove erroneous features in the other.Figure 7 shows good coherence between the edge detection and the planar segmentation, and in order to improve the automation of roof reconstruction, data from different data sets and sources should be integrated.The ridge line detection from the planar segmentation of cross sections could be improved with the inclusion of edge detection and the reduction of noise could be undertaken by finding edges that occur on a plane and remove them.
All of the extracted information will be integrated to create the roof model.A LOD2 building model will be achieved by combining the roof model with a LOD1 building model created from the OS MasterMap building topography.

Figure 1 .
Figure 1.Orthophoto showing the city centre of Newcastle upon Tyne at 1:4500 ©UltraMap XP Image Copyright 2010, Ordnance Survey

Figure 2 .
Figure 2. Workflow for extraction of roof geometry

Figure 3 .
Figure 3. Example of a cross section extracted from a roof

Figure 4 .
Figure 4. Edge detection results (red) from a. Flat 2 building and b.M-Shaped building overlaid onto orthophoto

Table 1 .
Percentage of edges detected compared to manually digitised reference data using the Canny edge detector