ADAPTIVE HIERARCHICAL DENSE MATCHING OF MULTIVIEW AIRBORNE OBLIQUE IMAGERY

Traditional single-lens vertical photogrammetry can obtain object images from the air with rare lateral information of tall buildings. Multi-view airborne photogrammetry can get rich lateral texture of buildings, while the common area-based matching for oblique images may lose efficacy because of serious geometric distortion. A hierarchical dense matching algorithm is put forward here to match two oblique airborne images of different perspectives. Based on image hierarchical strategy and matching constraints, this algorithm delivers matching results from the upper layer of the pyramid to the below and implements per-pixel dense matching in the local Delaunay triangles between the original images. Experimental results show that the algorithm can effectively overcome the geometric distortion between different perspectives and achieve pixel-level dense matching entirely based on the image space. * Corresponding author.


INTRODUCTION
The traditional single-lens photogrammetry cannot effectively obtain the lateral texture of tall buildings, while multi-view airborne photogrammetry can compensate for this shortcoming (Zhu et al., 2013).Combination of vertical images and oblique images into 3D modeling with abundant texture is an important trend in the development of photogrammetry (Gui et al. 2012;lo et al. 2013).This platform is equipped with multipleperspective sensors to ensure more comprehensive access to objective information.The structure is usually mounted as one vertical lens with 4 ( or 2) oblique lenses (Zhu et al., 2013).
Image matching is a critical part of digital photogrammetry, whose quality directly affects the accuracy of the DSM (Digital Surface Model).The popular area-based matching methodology for multi-view oblique images almost completely loses efficacy because of the large geometric distortion between multi-view images.Matching algorithms for images with large geometric distortion can be invariant feature-based matching methods, e.g.SIFT (Scale Invariant Feature Transform) (Lowe, 2004), SURF (Speeded-Up Robust Features) (Bay et al., 2009), ASIFT (Affine -Scale Invariant Feature Transform) (Moreal and Yu, 2009).Dense matching methodology is usually narrowing searching area based on comprehensive utilization of imagespace and object-space information, e.g.GC 3 (Geometrically Constrained Cross-Correlation), MVLL (Modified Vertical Line Locus).But these algorithms are mainly used for images of the same perspective and experiments on multi-view oblique images are rare.In this paper, a dense matching algorithm specifically for multi-view oblique images is proposed based on invariant feature-based matching and geometric correction of area-based matching.

MULTI-VIEW AIRBORNE OBLIQUE IAMGES
Take five-view camera as an example, in multi-view airborne photogrammetry, the five lenses expose at the same time at a single shot, and images and their exterior orientation elements are obtained.The operation principals of the vertical lens are similar to those in the traditional photogrammetry.The tilt directions and placement characteristics determine the difference between oblique images and vertical images.The four oblique lenses are usually mounted as Malta shape, e.g.Chinese TOPDC-5, SWDC-5 and AMC580, of which the angles between vertical and oblique lenses generally range from 45° to 60°. Figure 1  Figure 1 shows there is not only rotation of optic axis but also pitching differences between images of different views.Moreal and Yu (2009) put that longitude difference and latitude difference from geography could be used to quantitatively measure the two kinds of tilt degrees in oblique images.In figure 1, the tile angles of oblique lenses are all 45°, so the longitude differences between (a) and (b), (c), (d) are 0°, 180°, 90° respectively and the latitude differences are all 45° (Zhang et al., 2014).The image contrast and brightness of different perspectives are of a significance difference and serious overlapping and redundancy exist in multi-view images.In addition, scales on one single oblique image are not constant, which is the most important difference from vertical images (Grenzdorffer et al., 2008;Gruber and Walcher, 2013;Petrie, 2008;Hohle, 2008).These characteristics of oblique images make the effects of feature-based matching and area-based matching decline sharply, and even fail completely.

THE OVERALL PIPELINE OF HIERARCHICAL MATCHING
Image matching is the process of searching the corresponding entity between the reference images and the matching images.
Often the corresponding point cannot be found on the matching image and then this problem seems to be insoluble.Geometric constraints are often used to initially narrow the searching area thus to improve the matching efficiency and reliability.In computer vision, given certain pairs of corresponding points, the relative camera geometry and the entire geometry of two images can be obtained.The widely used constraint methods are homography and epipolar constraint, the relevant matrix being the homography H and the fundamental matrix F (Hartley and Zisserman, 2003;Yang et al., 2012).
Image frame of the frame camera imagery is considerably wide now, so three-layer hierarchical matching strategy is used to deliver and refine the matching results.The corresponding pairs and geometric relationship are passed down to the original layer, and the per-pixel matching is finally realized on the original images.Figure 2 shows the overall pipeline of matching process between vertical image and oblique image.Although SIFT descriptor is invariant and stable towards image rotation, our experiments show that SIFT and ASIFT for images rotated to the same direction in advance can achieve better matching effect than for images without rotation.So overlapping images are rotated to the same perspective first so that the images will point in the same direction in general.Assume that the original image size is wh  (width× height) and the point coordinate on the rotated image is   , xy .Then there are three conditions of calculating the screen coordinates on the original images: 1. Before 180° rotation, the screen coordinate is 3. Before anticlockwise 90° rotation, the screen coordinate is Then both the original images after rotation are resampled twice to obtain three-layer image pyramids, from bottom to top: layer 0 (the original layer), layer 1 (the middle layer), layer 2 (the highest layer).On the second layer, after low-pass filtering twice, the image size tends to be smaller and detailed information has been largely filtered out.The remains are mainly overall texture so scale-and rotation-invariant ASIFT is appropriate to implement here.With those delivered corresponding pairs, the distortion geometry is estimated between the two images on the layer 1.The matching image is rectified to the reference image with the geometric relationship.Assume that the imaged areas are both plane and the systematic error impact of lens distortion and atmospheric refraction is overlooked here.Then the perspective transformation can fully simulate this "image-to-image" coordinate correspondence between the two images (Yang et al., 2012).The efficient SURF matching algorithm is used between the reference image and the rectified matching image to achieve sub-pixel matching.Certainly the pixel coordinates on the rectified matching image should be projected reversely to the original Layer 1.To ensure the accuracy of rectification and inverse calculation, the bilinear interpolation or cubic convolution interpolation is adopted here.
When the corresponding pairs are passed to the original images on layer 0, Delaunay triangulation is constructed among the evenly-distributed matching pairs after screening.The empty circle and the maximum minimum angle characteristics of Delaunay triangulation can ensure the nearest points are used to restrain the dense matching process (Zhang et al., 2013;Wu et al., 2011).To ensure the Delaunay networks on both images are corresponding and consistent, the Delaunay network is constructed first on the reference image with the reliable matching features.Following this is construction of network on the matching image with the chain code sequence of the left network.Finally, per-pixel dense matching is implemented in corresponding triangles.

Delaunay Triangulation on Layer 0
Delaunay triangular network is established with upper matching pairs on layer 0, which is followed by per-pixel dense matching in each Delaunay triangle.According to the theory of continuous disparity (Wu et al., 2011), the corresponding pixel of the reference image pixel must be located in or around the corresponding triangle.For a pair of corresponding triangles, our method calculates the circumscribed rectangles of both the triangles first and take the new rectangle with maximum length and width as the local rectification cell.The rectangular calculation cell should be expanded outward a certain number of pixels (half of the window size for correlation coefficient calculation) to ensure adequate surrounding pixels take part in the calculation of correlation coefficients.At that time, each pair of corresponding Delaunay triangles corresponds to one pair of "Rectangular Patches".In each pair of rectangular patches, dense matching is implemented for every pixel in the Delaunay triangle as following: 1. Search for the feature-based matching pairs in the rectangular range twice as wide as the rectangular patch to 2. Rectify the matching patch to the reference patch based on the perspective transformation matrix.Ideally, the pixels on the rectified patch exactly corresponds to the ones on the matching patch, i.e. the pixels with the same coordinates on both patches are ideally corresponding pixels.But owing to terrain deformation and undulation, there is a slight offset between the corresponding pixels.
3. For every point P on the reference triangle, open the same nn  searching window around the pixel of the same coordinate on the right patch.Epipolar geometry is used to further constrain the searching space.Normalized Cross Correlation (NCC) (Wu et al., 2011) is taken as the correlation measurement here which is calculated between the reference pixel and the matching pixels in the searching window.The pixel on the right patch with the maximum NCC which is larger than the given threshold is the final corresponding pixel.
Here we take rectangular patch as the rectification cell and the pixels in the triangles are only rectified once, thus the process of calculating NCC only involves "reading" gray level of pixels, and avoiding pixel projection and interpolation, which effectively improved operational efficiency.Besides, the NCC threshold is adaptively determined with the three pairs of triangular vertices.If the total number of corresponding pairs for calculating perspective transformation matrix is only 4 or 5, then the perspective matrix is not reliable enough and the threshold needs to be further increased by 0.1.Tests show that threshold of NCC generally ranges from 0.65 to 0.85.

Homography and Fundamental Matrix
In the imaging process, the perspective transformation can be represented with the homography H. Assume that the overlapping area or the imaged area is absolutely plane and only rotation, translation and scaling change exist, then all the corresponding points between the two images can be characterized with only one transformation matrix.The reference pixel coordinate is ( , ,1) T xy ,the matching pixel coordinate is ( , ,1) T xy  , the homography between them is in (Yang et al., 2012): H a a a a a a a a a Then there is the following relationship between the two coordinates (Yang et al., 2012): a x a y a x a x a y a a x a y a y a x a y a Epipolar geometry is widely used as image matching constraint: given the fundamental matrix F between two images and the reference pixel coordinate, the epipolar line can be determined on the matching image.Theoretically, the corresponding point on the matching image must be on the corresponding line, so epipolar geometry can transfer the twodimensional searching into one-dimentioanal.F is determined by imaging camera position and orientation and can also be approximately obtained with at least 8 pairs of corresponding pixels (Hartley and Zisserman, 2003).
F is calculated with the delivered SURF matching pixels from layer 1.Take these pairs as check points, experiental results show that 98% of these points can achieve the accuracy of 5 pixels and 100% can achieve the accuracy of 6 pixels.The constraint effect is shown in Figure 3.The horizon-axis is the corresponding point sequence and the vertical-axis is the constrait offset.Figure 4 shows the constrait effect of 5-pixel accuracy of features and the corresponding epipolar lines.the circles are the matching points.By further comparison, epipolar constraint can achieve higher accuracy than homography constraint, i.e., F can better express the overall geometric transformation of two images than H.So on layer 0, homography constraint is used in the local triangles, while epipolar constraint is globally used.When calculating H or F, more than necessary pairs of corresponding points are used to obtain optimal solutions under the RANSAC (Random Sample Consensus) (Fisehler and Bolles, 1981) criteria.

Feature Screening
In the upper two layers of the image pyramid, two types of corresponding features need to be screened.One is mismatching points derived from ASIFT and SURF, the other is dense matching points.For the first type, RANSAC can take the role to screen them.The over-dense matching pairs are screened in case of non-significant Delaunay constraint and low efficiency.The main idea of feature screening is judging the distance of two pairs of corresponding points on the reference and the matching images.If one of the two distances is less than the threshold, one of the two pairs should be removed.Provided that n pairs of corresponding points are given, the number of judgement is The premise of the screening process is the acquisition of reliable and high-precision matching features.After screening, the matching features can be used to establish Delaunay triangles.

EXPERIMENTS
The experimental data are both derived from the airborne images over Dengfeng taken by AMC580.The final matching results of the two experiments are showed in Figure 5  This paper adopts the manual method to testify the matching accuracy.On the original images, 50 easily distinguishable corresponding pairs are selected and we find that the numbers of correct matching for the two groups are 49 and 46 respectively and the matching accuracy goes to pixel-level.There are two main sources of mismatches: one source is matching errors on the textureless or texture-repetitive area; the other is projection and interpolation error on layer 1 and layer 0.

CONCLUSIONS AND OUTLOOK
Without POS data, this paper implements the dense per-pixel matching which can effectively compensate for the longitude and latitude deformation of multi-view oblique images.The image hierarchical strategy and image space-based geometric constraints are used to restrain the searching area and refine the matching results.The Delaunay triangulation is used to compensate for the serious geometric distortion to achieve the successful pixel-level matching for at least 75% pixels on the restrained area of the reference multi-view images.Since the Delaunay triangles on the original images are rectified only once, the searching process can be implemented around the pixels of the same coordinates and thus avoid the repetitive projection and interpolation.For the challenges of matching textureless or texture-repetitive area, multi-view matching and object space information is to be further studied and combined into our algorithm.
shows the vertical-, forward-, rear-and leftview images of the same scene taken by five-view AMC580 camera over Dengfeng, Henan province.(a) Vertical-view (b) Forward-view (c) Rear-view (d) Left-view Figure 1.The multi-view images of AMC580

Figure 2 .
Figure 2. The overall pipeline of hierarchical dense matching

The
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-3/W2, 2015 PIA15+HRIGI15 -Joint ISPRS conference 2015, 25-27 March 2015, Munich, Germany calculate the perspective transformation geometry.The featurebased pairs are delivered from SURF on Layer 1.

Figure 3 .
Figure 3. Constraint accuracy of epipolar geometry 2 n C .For the i th and the j th corresponding points which are independent, assume that DLij is the distance of The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-3/W2, 2015 PIA15+HRIGI15 -Joint ISPRS conference 2015, 25-27 March 2015, Munich, Germany the two features on the reference image, DRij  two pairs are too closely distributed, so remove one pair of them.1T is the repetitive and redundant threshold, which takes 5 pixels here.two pairs are so close that the triangulation constraint is not significant, so remove one pair.2T is the minimum distance threshold, often determined by the terrain condition and matching accuracy and should increase gradually down along the image pyramid.
and Figure 6.In these two figures, (a) and (b) are matching results of layer 2 and layer 1 respectively; (c) are the Delaunay triangulation on layer 0; (d) are point clouds generated from matching pairs assisted with the exterior elements derived from the commercial Icaros Photogrammetric Suite (IPS).(a) Matching results of layer 2 (b) Matching results of layer 1 (c) Triangular networks on layer 0 (d) Dense point cloud Figure 5. Matching results of Group 1 (a) Matching results of layer 2 The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-3/W2, 2015 PIA15+HRIGI15 -Joint ISPRS conference 2015, 25-27 March 2015, Munich, Germany This contribution has been peer-reviewed.doi:10.5194/isprsarchives-XL-3-W2-289-2015(b) Matching results of layer 1 (c) Triangular networks on layer 0 (d) Dense point cloud Figure 6.Matching results of Group 2 By comparison of Figure 5(a) and (b), the feature-base matching of ASIFT on Layer 2 and the SURF on Layer 1 can achieve the similar distribution of matching features.In the textureless area, the ASIFT and SURF are difficult to achieve successful matching.In Figure 6(a) and (b), there is no successful matching in the upper right area (the vegetarian area) and these areas are usually constrained by large triangles and thus difficult to achieve a successful match in the following area-based matching (see the sparse point cloud on the upper right area in Figure 6(d)).Certain pairs of the 186 Delaunay triangles and rectangular patches in Group 1 are shown in Figure 7. On each row, the first two patches are the corresponding rectangular patches and the following two are rectangular patches with the corresponding Delaunay triangles.The corresponding rectangular patches are of the same size.The left images shown in Figure 7 are the reference images while the right are the rectified matching images.So the corresponding patches on every row look similar to each other and the corresponding pixels are close with only several pixels' offset.(a) Rectangular patches (b) Delaunay Triangles Figure 7. Part rectangular patches of Group 1

Table 1 .
They are cut from the original 10328 ×7760 oblique images.The data information is shown in Table1.The dataset information Under the 3-layer image pyramid matching strategy, the reliable matching results and Delaunay triangle numbers are listed in Table2.Group 1 is matching images of plain area, and the terrain is plane.Because of the overall direction is not the same, the rear-view image should be rotated by 180° first.Only 64 pairs of ASIFT matching pairs are remained after screening.100 pairs of matching pairs are obtained on layer 1, which are delivered to Layer 0 to establish 186 Delaunay triangles.Perpixel dense matching is conducted on layer 0 finally and 214,028 pairs of corresponding points are obtained.Group 2 is matching the vertical-and forward-view images.340 Delaunay triangles are established on Layer 0 and 317,551 pairs of points are matched.Every pixel in the Delaunay triangles is meant to search for the corresponding pixel on the matching image.The matching success rate is the percentage of successful matching numbers to all the points on the reference Delaunay triangles.The matching success rates of the two experiments are both over 75%.

Table 2 .
Matching results