AN AERIAL-IMAGE DENSE MATCHING APPROACH BASED ON OPTICAL FLOW FIELD

Dense matching plays an important role in many fields, such as DEM (digital evaluation model) producing, robot navigation and 3D environment reconstruction. Traditional approaches may meet the demand of accuracy. But the calculation time and out puts density is hardly be accepted. Focus on the matching efficiency and complex terrain surface matching feasibility an aerial image dense matching method based on optical flow field is proposed in this paper. First, some high accurate and uniformed control points are extracted by using the feature based matching method. Then the optical flow is calculated by using these control points, so as to determine the similar region between two images. Second, the optical flow field is interpolated by using the multi-level B-spline interpolation in the similar region and accomplished the pixel by pixel coarse matching. Final, the results related to the coarse matching refinement based on the combined constraint, which recognizes the same points between images. The experimental results have shown that our method can achieve per-pixel dense matching points, the matching accuracy achieves sub-pixel level, and fully meet the three-dimensional reconstruction and automatic generation of DSM-intensive matching’s requirements. The comparison experiments demonstrated that our approach’s matching efficiency is higher than semi-global matching (SGM) and Patch-based multi-view stereo matching (PMVS) which verifies the feasibility and effectiveness of the algorithm.


INTRODUCTION
Dense matching is a key ingredient in automated acquisition of geometric models and 3D scenes from image sequence or videos, a process known as image-based 3D reconstruction.As for its application, dense matching can be used as for large-scale mapping, true orthophoto generation, environmental surveying archaeology and culture heritage, and especially 3D city modelling (Xiao, 2016).In last decades, numbers of wellperformed algorithms have been developed they mainly divided into four types.Voxel based approaches, deformable polygonal based approaches, depth map based approaches and patch-based approaches.Voxel based approaches require knowing the bounding box that contains the scene that their accuracy is limited by the resolution of the voxel grid (Furukawa, 2010).Methods based on deformable polygonal meshes demand a high quality start point, such as a visual hull model (Laurentini, 1994), they use this kind of start point to initialized the corresponding optimization process.Those two kinds of approaches are often limited to the datasets quality and initial processing quality, which are not flexible.Compared to the voxel-based and polygonal mesh-based approaches, patch-based approaches such as patch-based multi-view stereo (PMVS) (Furukawa, 2010), and depth-based approaches such as semiglobal matching (SGM) (Hirschmüller, 2008) are more flexible.SGM and its acceleration algorithms are widely used in digital evaluation model (DEM) generation and 3D scene reconstructions (Halaa, 2012).Although SGM is more flexible than voxel-based and polygonal mesh-based approaches, it require fusing individual depth map into a single 3D model.And the noise of depth maps often influence the final results accuracy and reliability.Rothermel proposed an enhanced SGM by using dynamic disparity range to search the pixel correspondence (Rothermel, 2012).Patch-based approaches are more reasonable in one scene than rectangle window matching in one image.Furukawa generates a sparse set of patches corresponding to the salient image features, and then spreads the initial matches to nearby pixels and filters incorrect matches to maintain surface accuracy and completeness.It is more robust to different dataset, and has no require of initialization and topology assumption.PMVS and SGM are using geometric information to search the correspondence.The essence of these methods is dual decomposition.They break down the main task into a problem sequence.Using iterative process solve the whole problem sequence to find the similar or optimized solutions.
On photogrammetry aspects, the geometry of the topologic relation between aerial images or UAV images is complex.And the terrain of the surface is changeable.That makes difficulties for directly using the well-performed computer vision matching approaches.On the other hand the dataset in Photogrammetry is large.To improve processing the efficiency Mingyao Ai improved the PMVS for UAV platform (Ai, 2015), it feeds the PMVS software with matched points as seed points to obtain a dense point cloud.And Xiongwu Xiao proposed a self-adaptive patch and image grouping PMVS approach to improve the matching efficiency (Xiao, 2016).
Although these two developed PMVS methods improved the processing efficiency.The datasets they used are UAV images, which is 5 or 10 times smaller than traditional aerial image.In order to deal with aerial images and improve the matching accuracy a novel optical flow field based dense matching methods for aerial image is proposed in this paper, which is characterized by large scale aerial images, complex surface terrain.The proposed method contains 3 steps.The coarse matching based on Optical flow interpolation, the fine matching based on dual-constraint rectify and local optimization ， the mismatching elimination based on RANSAC and relative orientation.It is used to improve the aerial image dense matching density, efficiency and accuracy.The experimental results demonstrate that the proposed method is feasible and meet the demand of the DEM generation, while the matching accuracy is at the same level as PMVS and SGM, the matching efficiency is higher than PMVS and SGM.

METHODOLOGY AND WORKFLOW
In this paper a novel aerial image dense matching method based on optical flow field has been proposed.The approach proceeds as follows: First, some high accurate and uniformed control points are extracted by using the feature based matching method PCA-SIFT (Ke and Sukthankar, 2004) which extracts uniformed accurate correspondence between the image pairs as the seed points.Then the optical flow is calculated by using these seed points, so as to determine the overlap region between image pairs.Second, the optical flow field is interpolated by using the multi-level B-spline interpolation in the overlap region and accomplished the pixel by pixel coarse matching.Third, the fine matching based on dual-constraint is conducted on the coarse matching results.At last the RANSAC based relative orientation algorithm is adopted to eliminate the mismatching points, thereby the dense correspondence of the overlap area can be constructed.
As can be seen from the Figure 1, the key steps of the whole approach are UR-SIFT matching to define the control lattice, optical flow field interpolation by multi-level B-spline.Dualconstraint based fine matching.RANSAC based mismatching elimination.
Figure 1.The matching strategy of proposed approach

PCA-SIFT based matching
To ensure the seed points are uniform coverage and the overlap between the image pairs are accurate, The PCA-SIFT was adopted to obtain the initial corresponding points.
PCA-SIFT was proposed by Ke and Sukthankar in 2004, they using principal component analysis to reduce the SIFT descriptors dimension.Firstly SIFT features are extracted by traditional SIFT method, after that the 41×41 image block was selected, which centralized by feature point.Secondly, the horizontal and vertical grey gradient of the image block is calculated to construct a 3042 dimensional vector as the PCA-SIFT descriptor.Suppose the extracted SIFT features number is n, then a 3042 × n matrix S can be utilized to describe the extracted PCA-SIFT features.The covariance matrix of S can be calculated as formula 1: Where m  = mean vector of SIFT descriptor x = corresponding SIFT descriptor E= expected value C x = covariance of vector x By calculating the eigenvalues and eigenvectors of matrix C we select the largest k-th eigenvalues and the corresponding eigenvectors to build a k×3042 projection matrix   .Through the experiments, Ye find out that when k equal to 36, the computation efficiency and the descriptors uniqueness are ensured in a stable level.The SIFT descriptor can be represented as formula 2: Then reduce the descriptors dimension to 36.Use the Euclidean distance to find the correspondence.After the PCA-SIFT matching result was obtained， it can be utilized as the initial value of the least square matching to improve the correspondence accuracy.

Optical flow based coarse matching
The essence of image matching is finding each pixel movement between the image pairs.In order to get this information we use PCA-SIFT matched correspondence as the seed points to calculate some discrete optical flows.In order to describe each pixel's motion the thin optical flow field is not enough.Traditional optical flow measurements assume that in local region the brightness of the pixel will not change and the pixel's move range is small.When dealing with the aerial images.Those assumptions are not existed.Aerial image has a lot of distortion because of the photography position is changing all the time.And occlusion may cause brightness change.In this case a Multi-level B-spline interpolation is proposed to simulate the optical field only consider on the geometry aspect.
The coefficient between the interpolation nodes and the seed points was weighted by the above formulas and Gaussian function in distance.That f can be denoted as formula 6.
To reducing the interpolation error and make result more reliable.We improve our model to multi-level.Assume the first interpolation grid is Φ 0 , the interpolation function f 0 can be calculated as mentioned above.Obviously f 0 is the approximation function to interpolate P, so the difference between them can be denoted as Δ c 1 =z c -f 0 (x c , y c ).Then we use the fine interpolation grid to approximate P 1 =(x c , y c ,Δ c 1 ) by calculating the interpolation function f 1 .Then we can accumulate f 0 +f 1 and obtain a smaller difference to P. The difference after second interpolation can be denoted as Δ c 2 =z c -f 0 (x c , y c )-f 1 (x c ,y c ).
Through iterations the optimized interpolation function can be denoted as formula 7.
where k = iteration times.
After Optical flow field simulations.We can obtain the coarse matching result.As B-spline is very smooth, the coarse matching results accuracy is ordered by the distance between interpolated points and seed points.

Dual-constraint fine matching
The concept of our dual constraint rectify is based on epipolar constraint and affine transform.In the coarse matching step, PCA-SIFT matching combined with least square matching are utilized to conduct some high accuracy and reliable correspondence.By using these point pairs, the epipolar line of each point pairs in the left and right images can be calculated.
For each pixel in the left image, if it has not matched in the PCA-SIFT matching step, it approximate corresponding in right image can be extract by calculate its optical flow.Then using the epipolar line and expand it with the neighbor 2 pixels to build up a template and the searching window, to do the fine matching.The matching template and searching window is shown as Figure 3 Left image Right image Figure 3.The matching template and searching window According to figure 3 the shelter area Ω 1 and Ω 2 is the selected matching area.l 1 is the epipolar line of point a in the left image, l 2 is the corresponding epipolar line of point b in right image.a and b are correspondence calculate by the optical flow.Normally we can use 3 kind of information to describe a point: position, scale and angle.We denote the corresponding point pair as (  ,   ,   ) and (  ,   ,   ) , and denote the dual constraint rectify matrix as formula 8.
where S x = scaling coefficient on x direction S y = scaling coefficient on y direction ϑ = corresponding windows relative rotation angle.
The rectify procedure can be shown as figure 4. In calculation we use the distance between the selected points and the seed point weighted the scale coefficient.The weigh function is same as multi-level B-spline interpolation we described in section 2.2.Assume the number of the seed point pairs is n.The i-th seed point's contribution to selected points is ρ i , We calculate the scale coefficient as formula 9.
Dual constraint rectify matrix is utilized to affine the matching area.The final matching image blocks are shown as W 1 and W 2 in figure 4. In this way a lot of redundant computation can be reduced.
After dual constraint rectify, the NCC (Normalized crosscorrelation methods) is employed for fine match.According to the NCC similarity, a 3×3 area centered by the matched point is selected to do a polynomial fitting.At last, the optimized matching point is determined as the extreme point in the polynomial curve.The polynomial denote as follows.
f(x,y)=a 0 x 2 +a 1 y 2 +a 2 xy+a 3 x+a 4 y+a 5 (10) In formula 10, we denote f(x,y)=1/NCC(x,y) .Use Gaussian function to weigh each pixel's contribution by distance.The final result is generated by least square method.
x opt = (2a 1 a 3 -a 2 a 4 )/(a 2 2 -4a 1 a 0 ) While the iteration time achieve a certain number and the result still can't meet the condition we set.RANSAC fails.
In our approach, the test sample has a very large dataset.Use the traditional RANSAC may have numerous iterations.
Assume that the non-mismatching probability of the fine matching result is p, the probability of acquiring at least n nonmismatching point pairs from a k times random selection in fine matching results is Q.Then (1 − p n )  denote the probability of the selected k subsets each one contained mismatching point pairs.Thus k can be denoted as follows: From 11 we find that based on our assumption, RANSAC iteration times are not affected by observations number.
Normally the mismatching rate in PCA-SIFT is around 20%.As the proposed dense matching approach use the PCA-SIFT results as the initial input.The mismatching rate is related to PCA-SIFT.Assume the mismatching rate of proposed approach is 30%.Relative orientation needs at least 5 corresponding point pairs.If we want the correct probability achieve 99%.By calculation, k= 20.
In photogrammetry, the correspondence in a stereo pair fulfils the co-planarity condition.Thus, we use the co-planarity condition as the mathematical model to operate RANSAC.Use least square method to calculate the error function.The point's residual larger than 3 times of the mean-error will be eliminated.

EXPERIMENTS AND ANALYSIS
In this section the experiments result is evaluated with accuracy, robustness and efficiency.First experiment focus on analysing proposed dense matching approach.Second experiment is a comparison experiment to SGM and PMVS method.All the experiments use C++ as the programing language, VS 2010 as the programming platform.And the PC with Intel core i7 4700M processor, 16GB RAM.The detailed experiments are shown as follows.
The first experiment's dataset is a stereo image pair.Shown as figure 5, the images are taken by Leica RCD 30 camera.Land surface is hills area.Most parts of the image are covered by farmland.Also some houses, forests and pools are involved in.The camera parameters are described as follows: focal length is 53.0 mm; CCD pixel size is 6 μm; image size is 6732×9000 pixels; the overlap between the stereo pair is 60%.We used WuCAPS (Yuan, 2008)   From figure 7 we found that SGM result's accuracy is unstable.And the highest accuracy in the graph conducted by SGM method in the 10-th stereo pair.We checked it and found that the matched correspondence is concentrated in one area, and the matched points are extremely few in that stereo.Which means it is a fake high accuracy.Compared with PMVS and our approach the accuracy curve is stable, and the almost the same the difference between them is less than 10%.
As it shown in table 2, the optical flow based dense matching methods is more efficiency than the other 2 method.Through 30 images our approach obtained 32457821 correspondences the number is more than SGM and PMVS.And the operation time is less than PMVS.Compared with SGM the operation time is 1.18 times longer, but the point clouds density is 1.3 times higher.Which means our approach is more efficiency.As it shown in table 2, the optical flow based dense matching methods is more efficiency than the other 2 method.Through 30 images our approach obtained 32457821 correspondences the number is more than SGM and PMVS.And the operation time is less than PMVS.Compared with SGM the operation time is 1.18 times longer, but the point clouds density is 1.3 times higher.Which means our approach is more efficiency.

CONCLUSIONS
In this paper a novel aerial image dense matching approach is proposed.We use PCA-SIFT extract uniformed seed points, and utilized with multi-level b-spline interpolation to extract each pixel's motion between the stereos.By employing dualconstraint rectify in fine matching step.A lot of redundant computation is reduced.Finally RANSAC based relative orientation method is utilized for mismatching elimination.The experimental results demonstrate that the proposed approach is feasible, reliable and efficiency.
Figure 2. The relation schema of interpolation gridThe node value of the interpolation grid is the interpolation coefficient to the seed's points.The interpolation problem can be represented as finding the optimized interpolation grid Φ.As figure2 shown, the node value denote as follows;

Figure 5 .
Figure 5. Original aerial image stereo pair The matched 3D point cloud is shown as figure 6.

Figure 6 .
Figure 6.Dense matching point cloud From figure 6, we found that our dense matching point cloud is very density.Most part of the overlap area achieved the pixelwise match.After mismatching eliminate some unmatched points present in the building and forests area.It mainly cause by occlusion.The road, roof, pool area can be accurate presented.The forward intersection result from matched points fit well with the ground surface.Which means the result can directly use for DEM producing.

Figure 7 .
Figure 7. Accuracy curve of each method Random select a sub set S from P. S involved with n observations.Obviously, S can determine a mathematical model.Denote the mathematical model as M. Use all the observations in P to test model M. Set a test threshold.Passed observations can build a new set S 1 .Denote S 1 as the model M's consistent set.Second, compare the observation number of S 1 and the threshold t, if it larger than t.Use S 1 to determine the new mathematical model M 1 .Third, if the observations number of S 1 less than the threshold t.Use first step to find the S 1 can meet the demands that the observations number of S 1 no less than t.
Firstly, a mathematical model is selected.Assume that the model can be determined by at least n parameters.An observation set P is given, contained m observations, m>n.

Table . 1
Table1present that our dense matching result accuracy achieved ±0.55 pixel level.YUAN proved that the unit weight error of relative orientation can be utilized to evaluate the matching result's accuracy in image pixel coordinates.Although we select our point clouds by grid to operate the relative orientation the result present stable.Compared the each select result, the density increase 1 time the matching point pairs density increase 4 times.The unit weight error is stable to the density change.It means our results' accuracy is uniform and stable.It won't affected by the land terrain change and ground object change.The accuracy of our result can meet the demand of real applications.Accuracy of relative orientation with dense matching point cloudsThe second experiment is conducted on the same dataset.To ensure the results reliability, we use the whole image sequence which contains 30 images to make the comparison experiments on PMVS, SGM and our approach.The matched point clouds relative orientation accuracy curve is shown as figure7.And the matching efficiency is shown in table 2.

Table 2 .
The operation time of each method