POTENTIAL OF DENSE MATCHING FOR THE GENERATION OF HIGH QUALITY DIGITAL ELEVATION MODELS

Until recently, the acquisition of high quality Digital Elevation Models was dominated by the use of airborne LiDAR. However, the increasing quality of digital airborne cameras in combination with recent improvements in matching algorithms meanwhile allow for the automatic image based collection as a suitable alternative. Within the paper, these progresses will be demonstrated on the example of photogrammetric DEM generation using the Semi-Global Matching (SGM) stereo method. Since this approach aims at a pixel-wise matching, dense 3D point clouds can be generated. The tests described in the paper are based on data collected from different digital airborne cameras at various flight scenarios during a recent test on photogrammetric 3D data capture. By these means, the impact of different stereo configurations on the quality of the final outcome can be evaluated and compared to already available test results. Special interest is also paid to the analysis and combination of multiple stereo image pairs with different baseto-height ratios, which can be used efficiently to increase the accuracy and reliability of the matching result.


INTRODUCTION
Digital image matching for automatic point transfer is a wellknown standard procedure within photogrammetric software tools.However, in contrast to the great relevance of image matching within automatic aerial triangulation, the importance of this technique for 3D surface reconstruction has long been subordinated.Especially while aiming at very accurate and dense Digital Surface Models (DSM), image based surface reconstruction was frequently outpaced by airborne LiDAR.Meanwhile this gap is diminishing.As an example, digital aerial cameras can provide highly overlapping airborne imagery of good dynamic and signal-to-noise-ratio as standard data sets.This is highly beneficial for automatic image matching especially for surfaces with relatively little surface texture.Consequently, the quality and accuracy of image based point transfer as basic observation for 3D surface reconstruction ameliorated considerably, as also verified within recent tests (Haala et. al., 2010).Thus, digital image matching establishes as a valid alternative to airborne LiDAR.
The current comeback of image based DSM generation additionally triggered a renaissance in dedicated software developments.Within existing commercial software tools, stereo image matching usually applies algorithms for feature extraction and matching.This approach is potentially fast and reliable, but can lead to an inhomogeneous distribution of matched points.Regions with a rather limited number of 3D points can especially occur in areas of low image texture.Ideally, the DSM resolution should only be limited by the ground sampling distance of the available imagery.However, this requires a matching result for each image pixel.In order to cope with the general ambiguity of such a per-pixel measurement, additional constraints, such as the assumption of a smooth surface are usually introduced.Algorithms that globally minimize matching costs between corresponding pixels and the respective smoothness constraints are called global image matching.While they provide good results in terms of quality and resolution, they usually suffer from a high complexity, which results in rather low performance.However, this computational complexity can be reduced significantly by the Semi-Global Matching (SGM) stereo method as proposed by (Hirschmüller, 2008).This approach, which will be used during our investigations, approximates a global approach by minimizing matching costs, which are aggregated along a certain number of 1D path directions through the image.Since disparity jumps are prevented by penalized by additional costs, smooth disparity courses along the paths and thus to homogeneous surfaces are biased.The pixel-wise SGM approach provides a dense point distribution, while the global approximation on paths enables a reasonable runtime on large imagery.
The potential of the algorithm was already demonstrated for different applications and data sets, including aerial images, satellite data or video sequences.Inspired by these promising results, the Semi-Global Matching method was implemented by the authors, including the Mutual Information matching cost calculation, the cost aggregation step, the disparity postprocessing procedures and a triangulation into object space.Within the paper, an accuracy and performance evaluation of this algorithm will be presented.The investigations are based on datasets from the recent project on Digital Photogrammetric Camera Evaluation, which was initiated by the German Society for Photogrammetry, Remote Sensing and Geoinformation (DGPF).It contains imagery from several large format aerial cameras flown on the same test site at two flying heights (Cramer, 2010).For comparison, data from a LiDAR system is additionally available.
Within the paper, the basic principles of the implemented SGM approach are briefly introduced in section 2. The main part of the paper in section 3 then evaluates the potential of this algorithm for image based dense DEM generation for several configurations.By analysis of multiple stereo image pairs with different base-to-height ratios the general trade-of between good intersection geometry for large base-to-height ratios and simplified matching due to greater similarity of image content for short baselines is evaluated.The matching accuracy for these different stereo image configurations is determined for test areas defined by a planar sports field and a built-up area of higher geometric complexity.The availability of highly overlapping imagery is additionally used for consistency checks to evaluate matching quality and eliminate erroneous matches.These investigations based on redundant 3D point measurement from multiple stereo pairs are also used to evaluate the potential of SGM for different camera systems and illumination conditions as available from the DGPF test data set.These results are also compared to point clouds from airborne LiDAR and the commercial software tool MATCH-T DSM as a reference.

DENSE DEM GENERATION
Within this section, the basics of the implemented Semi-Global Matching approach and the subsequent post-processing will be presented

Semi-Global Matching
Our implementation of the SGM algorithm in large extent is similar to (Hirschmüller, 2008).The aim of this pixel-wise matching is to relate each pixel coordinate i p in the base image to its corresponding pixel coordinate i q in the match image.

Each relation
ii pq ( , ) induces a matching cost.The sum of all costs defines the global matching cost which is assumed to be minimal for optimal image alignment.Since the problem of minimizing global matching costs for 2D images is known to be NP-hard, thus, the idea of the SGM is to minimize an approximation of the global cost.Using the exterior and interior camera orientations from bundle block adjustment, the epipolar line ii ep () induced in the match image by i p can be determined.Thus, the search for potential corresponding pixels can be limited to i j i j d  q e p , ( , ) .The disparity ij d , specifies the actual position on i e and therefore the parallax.The minimal costs i j i i j c pq ,, ( , ) of the set of potential correspondences define a first estimation of the disparity.Unfortunately, these cost minima often are not distinctive and result in wrong parallax estimations.This problem can be solved by cost aggregation.For this purpose a N M D  3D cost structure containing the costs i j i i j cd p ,, ( , ) is assembled.It contains the matching costs of NM  base image pixels and their D potential correspondences in the match image.The final aggregated matching cost i i j sd p , ( , ) of a base image pixel and its potential correspondence is derived by a weighted summation of minimal costs along 16 image paths k r ending in i j i i j cd p ,, ( , ) .Thereby penalty terms are introduced to force these cost paths to be smooth.However, distinctive minima maintain the possibility of non-continuous paths.These accumulated costs represent an approximation of the global costs.The final correspondence i q of a base image pixel i p , and therefore the parallax, is defined by the minimum of its aggregated costs i i j j s pq , min ( , ) .Sub-pixel accurate disparity estimations are derived by quadratic curve fitting and results are stored in a disparity images.Within all tests the Mutual Information (MI) matching cost was used.MI is based on entropies and joint entropies and can be used as a measure for the similarity of two images.Correspondences are chosen such that the MI of two images respectively their similarity is maximized.

Object point triangulation and multi-stereo matching
The matching estimates parallaxes, which link corresponding pixels between two images.Since interior and exterior orientations are known from AT, the coordinates X of the corresponding object point can be computed by triangulation.This problem can be formulated as linear homogenous system AX = 0 as described in (Hartley et.al., 2004), which is solved by a least squares estimation.By matching overlapping images redundant measurements i x of the same object point are obtained.Matching n images with the same base image results in n1  image points for object point X .All measurements can be related using epipolar geometry and the disparity estimations derived by SGM.Again, the triangulation problem can be formulated as over-determined linear homogenous system and solved using least squares techniques.Reprojecting X into the images results in the image coordinates i x ˆ.If the reprojection error ii 2  xx ˆis larger than a base-to-height ratio dependent threshold, the measurement i x is regarded as non- valid and is discarded.Iterative triangulation and evaluation of reprojection errors lead to the final object coordinates X .Latter strategy will be referred multi-stereo matching.

Elimination of mismatches
Despite the filtering of erroneous measurements within the multi-stereo approach as described in the section before, additional filters were applied to remove wrong disparity estimations.First a consistency check is performed by simply interchanging the roles of base and match images for image matching.Only disparity estimations consistent to this forwardbackward matching are considered as valid.Additional q 3,2 q 3,3 q 3,4 q 3,5 q , ( , ) erroneous disparity estimations are removed within filtering the disparity images by conservative smoothing (Jain, 1986) and a subsequent occlusion check (Hirschmüller, 2008).

PERFORMANCE OF SEMI-GLOBAL MATCHING
For all our investigations presented in this paper image data from the DGPF project for evaluation of photogrammetric aerial camera systems was used.This test data set consists of imagery from several different airborne camera systems with nominal ground sampling distances (GSD) of 20 and 8 cm (Cramer, 2009).However, for reasons of simplicity our tests were limited to 8cm GSD imagery.The first test area 'Sportsfield' depicted in Figure 2 is a planar football ground, already used for evaluations of commercial matching software (Haala et. al., 2010).Its simple geometry eases accuracy evaluations, which were additionally compared to LiDAR data as captured using Leica's ASL50.Since it features rather low surface texture, it is also very well suited to evaluate the performance of matching approaches for potentially challenging areas.
The basic stereo matching provides two corresponding image coordinates.However, to improve the accuracy of the respective object point, multiple matches can be merged by using redundant measurements for 3D point triangulation.In order to investigate effects of different base-to-height ratios and to demonstrate such benefits of multi-stereo matching, four image pairs with varying base lengths of 200-800m at a constant flight height of 1200m were used for multi-stereo matching and internal accuracies are analysed in section 3.1.
These investigations on varying base-to-height ratios are completed in section 3.2 by using imagery of more complex geometric structures.As it is visible in Figure 3, the second test area 'Castle' provides such structures.It consists of complex roof geometries, planar surfaces and vegetation as trees and vineyards.Section 3.3 evaluates the robustness of SGM regarding different sensors and flight specific variances as illumination changes, while a comparison of SGM to the commercial software tool MATCH-T (Lemaire, 2008), which applies feature-based matching algorithms is presented in section 3.4.Finally, the potential increase of completeness for the reconstructed point clouds by using multiple base images and fusing the resultant point clouds is discussed in section 3.5.
Figure 3: Test region 'Castle' with complex geometries and diversified texture.

Accuracy improvement by multi-stereo matching
In a first test the benefits of multi-stereo matching regarding point density and absolute accuracy was examined.Therefore five UltraCamX images of the same flight strip depicting the test area 'Sportsfield' were matched.Point clouds were generated for single stereo matching results (Config.1&2, Figure 4) and using the multi-stereo approach (Config.3&4, Figure 4).Obviously, the 3D reconstruction of a planar surface is rather beneficial for the SGM algorithm since it can be enforced by setting large values for the smoothness constraint.In order to guarantee adequate smoothness settings, the adjacent area of the sports field was matched such that a sufficiently detailed surface was obtained.The corresponding penalty settings were used throughout this test.In order to evaluate absolute accuracies, point clouds derived by SGM are compared to LiDAR data.Therefore a plane was fitted into the LiDAR point cloud.In order to eliminate outliers, the standard deviation z  of the point-to-plane residuals i r was computed and all points for which iz r3    were discarded.Since the field possesses rather small differences in height, residuals between LiDAR points and the estimated plane can be approximated by differences of their vertical components.The filtered LiDAR measurements were used to estimate the final reference plane.SGM point clouds were compared to this reference plane.After triangulation, outliers were removed using the same approach as for the LiDAR data.Resulting standard deviations for raw and filtered point-to-plane residuals are displayed in Figure 4. Large values for the unfiltered z  are a result of mismatches.For the multi-stereo configurations 3 and 4 standard deviations of raw and filtered residuals are almost identical.This demonstrated that mismatches are removed efficiently within the multi-stereo matching approach.Furthermore, absolute accuracies can be improved from 10cm to 5cm by incorporating redundant measurements.

Influence of varying base-to-height ratios
In order to examine the influence of different base-to-height ratios on SGM, four image pairs were matched with base-toheight ratios varying from 0.125 to 0.5.Imagery was captured by the UltraCamX, providing a GSD of 8cm.The test area depicts a castle surrounded by vegetation (Figure 3).Three point clouds were derived from the four stereo models.Thereby each point cloud is triangulated incorporating disparity estimations of two stereo models.The respective image configurations and corresponding point clouds are displayed in Figure 5. Filtering as discussed in sections 2.2 and 2.3 was applied and reconstructed points which induce reprojection errors larger than 0.3 pixels were eliminated.Triangulation of object points from two stereo models is convenient for evaluation purposes since the linear homogenous system for object point computation is over-determined.As a result standard deviations of object points coordinates can be estimated by evaluating the covariance matrices of the leastsquares solution.In the following planar and vertical standard deviations are denoted by xy  and z  .Large base-to-height ratios entail increased dissimilarity of base and match images.The main reasons are changes in perspective and variances in illumination.As a consequence image matching performs worse and the number of successfully matched points significantly decreases (Figure 5).Since the implemented filter algorithm requires an object point to be detected in at least three images, the number of reconstructed points is limited mostly by the success rate of the wide base-line image pair.As an example for Config.1 success rates of matching close and wide baseline image pairs amount 82% and 60%, respectively.After filtering within the multi-stereo approach only 48% of all points remain.On the other hand the geometric configuration for large base-to-height ratio imagery is advantageous and accuracy of triangulation can be improved.As visible in Figure 6, with increasing base-to-height ratios standard deviations in object space are decreasing.Thus, the beneficial geometric properties compensate worse performance of SGM.However, wide-baseline results are rather incomplete and contain a larger number of mismatches which are not eliminated by the filters.Matching performance is evaluated using error propagation.Therefore standard deviations z Respective standard deviations were calculated for small baseline image pairs of each configuration.It has to stay in mind that object points were derived using two stereo models and absolute values do not represent the actual accuracy of SGM.

Varying sensors and illumination conditions
In this section the robustness of SGM regarding different sensors and flight specific variances as illumination conditions is examined.All tests were carried out on the test area 'Sportsfield'.SGM stereo models were generated using imagery of the three camera systems UltraCamX, DMC and RMKTop15.As already displayed in Figure 2  0.12 0.16 -Table 1: Results for test area 'Sportsfield' based on SGM and 8cm GSD imagery, base-to-height ratios 0.26 -0.29.
For each system, image pairs providing comparable base-toheight ratios of 0.25 and 0.5 were matched.The same penalty settings were used throughout this test for all camera platforms.As in previous investigations SGM point clouds were compared to a reference plane fitted into the LiDAR data.Again, point-toplane residuals and the corresponding unbiased standard deviations z  were calculated and outliers for which residuals are larger than z 3  were removed.Eventually residuals and corresponding standard deviations of filtered points were computed.Results for base to-height ratios of 0.25 are listed in Table 1.The mean values of filtered point clouds are in the range of errors induced by the bundle block adjustment (Jacobsen et.al. 2010).Standard deviations are in the same range which means accuracy largely depends on the results of the aerial triangulation.As DMC imagery provides some texture and a small signal-to-noise-ratio, the noise of the matching results is limited.Matching of UltraCamX imagery fails in some regions of the sports field which results in lower point densities.This is due to homogeneous intensities which probably are a result of unfavourable illumination conditions.This clarifies the importance of diversified texture respectively radiometric resolution for SGM.However, the number of reconstructed points for both systems is significantly larger than measurements obtained by LiDAR.As in the previous test, matching accuracy I  in image space was derived by error propagation.As only one stereo pair is matched I  represents the actual matching accuracy of SGM with absolute values smaller than 0.2 pixels.
Matching wide baseline image pairs is unfavourable since it can result in low point density and frequent mismatches.Nevertheless, imagery of the RMK is only available at base-toheight ratios of 0.56.Therefore, SGM was evaluated for imagery of all three camera systems at comparable base-toheight ratios.Results are displayed in Table 2.As discussed in section 3.2, decreasing performance of image matching can be compensated by better geometric properties of the image configurations.The same effect is observed this test.As expected, SGM for digital imagery outer performs its analogous counterpart regarding accuracy since the smaller signal-to-noise ratio cause larger standard deviations.However, a matching accuracy of 0.31 pixels is achieved and derived point clouds are rather dense.

Comparison of results MATCH-T and SGM
In this section additional point clouds of the test area 'Sportsfield' were derived using the commercial software MATCH-T.Latter attacks the correspondence problem using feature-based and least-squares matching techniques.In comparison to our SGM results, the commercial tool incorporates all available imagery for object point generation.In contrast, for point clouds derived by SGM only 1 to 3 stereo pairs were matched and results were merged within the multistereo procedure.SGM was parameterized by the standard values as described in section 3. Again, accuracy was evaluated by comparing residuals of the generated points to a plane fitted into the LiDAR data. .Standard deviations of point-to-plane residuals amount 3.4cm for the feature-based approach and 2.7cm for the SGM method.It has to stay in mind that due to its black box character parameterization is limited and MATCH-T might not perform at its optimum.Furthermore, due to different weather and illumination conditions during acquisition of the DMC and UltraCamX imagery this does not allow for a comparison of the different digital camera systems, but demonstrates the capability of the matching algorithms at areas of very homogenous texture.

Combination of point clouds from different viewpoints
The UltraCamX dataset provides imagery with overlaps of 80% in and 60% cross flight direction.Therefore object points are detected in at least in 10 images.Within the implemented multistereo matching strategy an object point is required to be determined in at least three images to be valid.Moreover, only object points visible in the base image can be triangulated.Therefore additional points which appear only in the match images due to viewpoint changes cannot be reconstructed.This is a hard restriction especially when matching images of neighbouring flight strips.Furthermore, base-to-height-ratios of image pairs from different flight strips are generally larger than for same-strip imagery.As shown before, this leads to a decrease of matching performance and completeness.To overcome this problem, in-strip imagery of three adjacent flight strips were matched and resulting point clouds were merged in object space.Enhanced models for the test area 'Castle' and utilized image configurations are shown in Figure 8.

CONCLUSIONS
During our investigations SGM proved as a robust and easy to parameterize matching algorithm.Best matching results were obtained for stereo images with short baselines, good texture and small signal-to-noise ratios.Obviously, matching performance depends on the image content, since homogenous intensities and large signal-to-noise ratios can limit the quality of the generated 3D point clouds.Still, matching accuracies better than 0.2 pixels at point densities better than 80pts/m 2 or 60% of the available GSD were feasible even for a very low textured sports field.Potentially, the matching accuracy and reliability decreases for large base-to-height ratios due to changes in perspective and illumination.Nevertheless, the beneficial geometric properties for larger baselines at least partially compensate the reduced matching accuracy.Mismatches, which occur more frequently for large baseline images, can be eliminated efficiently during the implemented multi-stereo matching.However, this outlier filtering of course reduces the density of generated point clouds.For this reason, the combination of several stereo image pairs to multiple parallax estimations is especially beneficial.The combination of multiple measurements during triangulation increases the accuracy of the generated 3D point clouds while their completeness can be increased by multi-stereo matching of imagery with varying perspective.
During our tests imagery from two digital sensors UltraCamX and DMC and the analogue camera RMKTop15 were performed.By these means the robustness of SGM with respect to different sensors and flight specific variances of illumination could be demonstrated.During our tests, despite the fact that methods of post-processing might vary and have significant influence on the matching performance, SGM impressively proved its potential for the generation of high quality Digital Elevation Models.

ConfigFigure 4 :
Figure 4: Accuracies of raw and filtered point clouds derived by stereo and multi-stereo matching.

Figure 5 :
Figure 5: Camera configurations and corresponding point clouds; Red and blue rectangles mark the base respectively match images.

Figure 6 :
Figure 6: Standard deviations xy  , z  and number of success- fully triangulated points for camera configurations 1-3.
all images possess rather low texture.Particularly data of the UltraCamX offers only rather homogenous intensities.

Figure 7 Figure 7 :
Figure 7 displays all point clouds derived by SGM and MATCH-T.As the matching success rate of MATCH-T significantly decreases for RMK imagery, SGM seems to be pretty robust regarding the reduced signal-to-noise ratio of the analogous camera system.It provides a rather complete reconstruction and a standard deviation z  of 4.6cm.Due to homogenous texture SGM fails in some regions for the UltraCamX imagery.Successfully reconstructed points provide a standard deviation z  of 5.2cm.DMCUltraCamX RMK a): Point cloud derived by SGM using images of a single flight strip; (b): Combination of point clouds generated using imagery of three flight strips; test area 'Castle', Ultra-CamX, 8 cm GSD