SIFT FOR DENSE POINT CLOUD MATCHING AND AERO TRIANGULATION

: This paper presents a new method for dense point cloud matching and aero triangulation based on the well-known scale invariant feature transform (SIFT) technique. The modern digital cameras can take high resolution aerial images with high end lap between contiguous images in a strip and, if needed, also with high side lap between images on neighboring strips. Therefore, automation on image matching for generation of high density of 3D object points becomes applicable. A new method is thus developed to perform the processing. Moreover, it can do an aero triangulation and automatic tie point measurement without the need on the input data such as block and strip data for providing image overlap information. In order to increase the effectiveness of the method for simultaneously processing a large number of aerial images with large image format in a block area, both schemes of Quality Filtering (QF) and Affine Transformation Prediction (AFTP) are proposed for automatic tie point extraction and measurement with a better and satisfactory efficiency. Tests are done by using aerial images taken with the RMK DX camera in Taiwan. Also, high precision ground check points are adopted to evaluate the quality of the results. They show that a high density of 3D object points are extracted and determined. Furthermore, the automatic tie point selection and measurement is done efficiently even under the circumstance that no priori-knowledge on image overlap is available. Also, ground check points show that the accuracy of photo coordinates is 0.21 pixels, namely it reaches a subpixel level.


INTRODUCTION
One of the up-to-date issues in photogrammetry is dense matching, especially pixelwise matching of aerial images.Matching results provided by local stereo matching methods like Normalized Cross-Correlation (NCC) and Least Square Image Matching (LSIM) are in general not reliable enough.Global matching (GM) of highly overlapping images increases the reliability, but its computational complexity is too high.The commercial software Photosynth/Geosynth by Microsoft Corporation utilized the GM technique for dense matching and stitching pictures together, with Virtual Earth, encouraging businesses to combine the two technologies (Computerworld, 2012).In order to reduce the runtime of GM, the German Aerospace Center (Deutsches Zentrum für Luft-und Raumfahrt, DLR) developed the semiglobal matching (SGM) method.Both SGM and its extensions are described in typical publications like (Hirschmueller, 2008 and2011).They are adopted by the commercial software 3D RealityMaps to perform accurate and reliable dense point cloud matching, and are useful for many applications like 3D reconstruction of object surfaces, especially on local surfaces with occlusions, edges, fine structures, and low or repetitive textures (Siegert, 2011;RealityMaps, 2012).
For example, the pixelwise, Mutual Information (MI)-based matching cost is used for compensating radiometric differences of input images.The method offers a very good trade off between runtime and accuracy, particularly at object borders.SGM has participated in several tests and evaluations.The Middlebury stereo pages (Scharstein and Szeliski, 2011) currently list 108 stereo methods.The consistent SGM that is modified for structured indoor scenes has a Rank of 30 and an average error of 5.8%.
The SGM method is very well-known in the field of computer vision, and used for finding corresponding pixels in a pair of images or multiple ones.It assumes the image orientation data and the information of image overlapping are known.In photogrammetry, unknowns of image orientations and object coordinates can be solved by bundle block adjustment, which is a primary process of geomatic data acquisition (Heipke, 1997).To further increase the degree of automation of modern aerial triangulation and geomatic data acquisition, this paper proposes a new method based on SIFT for dense point cloud matching without the need on any image overlap information.

Main Processing Phases
To take a more compatible architecture into account, a scale and rotation invariant method is selected for automatic tie point measurements, namely the well-known scale invariant feature transform (SIFT) technique.SIFT belongs to the class of feature-based matching, and includes two main processing phases -keypoint extraction and keypoint matching (Lowe, 2004).Keypoint extraction includes Gaussian filtering and computation of DoG (Difference of Gaussian) at different image pyramid level to detect the extreme values.Those pixels with these extreme values are selected keypoints, described by means of a descriptor defined by a 128 dimensional vector.Then keypoint matching is simply to calculate the Euclidean distances from one keypoint descriptor on the left image to another keypoint descriptors on the right image, i.e. a pair of images at one time.If the distance ratio (the shortest Euclidean distance divided by the second short one) is smaller than the threshold, then the keypoint is matched.Thus, the one on the left image is matched to another one on the right image.Otherwise, the matching for this keypoint on the left image fails.
The matching and searching operations will be done repeatedly until all keypoints are processed.
The proposed dense matching strategy is illustrated in Figure 1.Without the need on priori knowledge on image overlap information, the first step is to process SIFT keypoint extraction to obtain the location (abbreviated as Loc.) and descriptor (abbreviated as Des.) of each kepoint on P input images (P ≧ 2).The loop number equals to P, namely the number of input images.Step 2 will be the keypoint matching for p C 2 pairs of images, and one image pair at each time.Then, the result table of each image matching pair stores locations (of matched points) and numbers of the left and right image for every image matching pair.And Step 3 will be matched point connection via comparing the locations of matched points, rearranging and coding all the matched points into numbered result, eventually.Every table of single image matching pair's result contains locations (row, column) of matched points on left and right image, denoted by (r L , c L ) and (r R , c R ).And the table of connection result stores location (r,c), point number (PN) and index value for every tie point in each image, which is done by means of location matching using the result table of image matching pair.The index value is used for descriptor inquiry, namely to inform that the descriptor belongs to the i-th keypoint on the j-th image.Therefore, the numbered tie points are connected, if their |∆r|<10 -6 pixels and |∆c|<10 -6 pixels, and the repeated measurements are eliminated at this step.Furthermore, in order to increase the operational efficiency especially for large format image of m x n pixels, e.g.m=12096 and n=11200 for our test aerial images, the input image is first divided into small sub-images, as shown in Figure 3. Taking the capacity of the core processing programs executed on the adopted PC into account, each sub-image of the size 1800 rows x 1800 columns is used in this study.Then, key points are extracted in each sub-image.All key points extracted in all subimages are then merged together to output the results of key point extraction for the original input image of large format.

Quality Filtering (QF)
The extremely huge number of key points limits the efficiency of matching a large number of aerial images with large image format.In order to reduce the runtime, quality filtering (QF) is attempting to reserve those key points with best image quality.The standard deviation G std of gray levels, computed by Eq.( 1), of every keypoint is computed in a local image window of 15 x 15 pixels centered at the keypoint.Generally, G std stands for the contrast of the keypoint image.In case of less noise, it also indicates the amount of texture information (or so-called quality) on the keypoint.

Affine Transformation Prediction (AFTP)
This method uses AFTP to estimate the overlap area, and to predict the location of searching window, as shown in Figure 4. Instead of using original high resolution images, AFTP uses higher layer images with less number of key points in image pyramid to perform a fast pre-matching to maintain the efficiency and determine the necessity of follow-up process simultaneously.The image size at the top level is assumed to be about 700 x 700 pixels.If the six affine transformation parameters of an image matching pair can be calculated with a proper accuracy by means of least-squares adjustment (LSA), then these two images are overlapped.The locations of their corresponding image points are approximately described by the affine transformation parameters, which can also be utilized for prediction of searching window.Otherwise, this image matching pair has no overlap or rare overlap, and it will be skipped in the follow-up matching process.As long as better overlapped image matching pairs are processed, the tie points can then be connected correctly in a procedure for connecting all matched points.
Moreover, another advantage of the AFTP on the top level of image pyramid is that AFTP automatically provides approximate image overlap information in a block.In other word, this study proposes the new method for SIFT supported dense point cloud matching and aero triangulation without the need on the known image overlap information which is often given by the input strip and block parameters.
Figure 5. AFTP provides approximate image overlap information automatically The afore-mentioned method for dense point cloud matching can be done by the only input of aerial images.After the transformation from image coordinates (r, c) to photo coordinates (x, y), a bundle block adjustment with data snooping (Baarda, 1968) procedure is used for preliminary error detection (Kruck, 1984), and quality validation is done by means of ground check points.First, a block of 11 images is selected for testing the computational efficiency of key point extraction, AFTP and QF, as shown in   2 shows the key point density for these 108 test images on different level of image pyramid, where a stricter threshold 0.2 for the distance ratio of SIFT is adopted to extract less number of best key points.Figure 8 illustrates the key point density on each pyramid level.Apparently, a higher level of image, namely a smaller scale of image, owns a denser cloud of key points.Generally speaking, the density of key points depends not only on the distance ratio threshold of SIFT but also on the amount of image information as well as image quality.For example, Figure 9 illustrates the locations of key points extracted from two images where red +, green ο , blue ◊ , magenta and cyan D denote key points extracted on the level 0, 1, 2, 3, and 4, respectively.The image 10123 (left) has better quality than the image 60185 (right) so that it has much denser cloud of key points.Both images 10123 and 60185 have the same image size of 12096 x 11200 pixels.The same distance ratio of 0.2 determines 359404 and 55269 key points on the image 10123 and 60185, respectively.Moreover, Figure 10 illustrates the location of key points after quality filtering on different pyramid level.Apparently, they are located at those local feature points with good contrast and textures.Table 3 shows summarily the number and density of key points without and with QF for these 108 test images.For example, Figure 11 illustrates the location of key points without and with quality filtering operation for both good image 10123 and bad image 60185.They show clearly that QF reserves less number of best key points.To investigate the operational efficiency of the method, twelve test cases are also done, and the results are briefly shown in Table 4.In Table 4, Cases I, II and III denote the operation with only QF, with only AFTP, with both QF and AFTP, respectively.The cases 1,2,3 and 4 denote the operation with the distance ratio of 0.20, 0.25, 0.30 and 0.35, respectively.The number of skipped points denotes the number of image points eliminated in the free network adjustment of AT.Thus, the ratio of the number of skipped points divided by the number of matched points, denoted as the skip rate, describes the goodness of matching in each case.As shown in Figure 12, when distance ratio is set to the threshold less than 0.3, Case III, namely operation with both QF and AFTP, maintains better matching efficiency and best matching with lowest rate of skipped points.
In general, the larger the distance ratio is, more points are matched per second, but the more the skipped points become.Moreover, Table 4 shows apparently that the calculation time is almost the same for the same case (I, II, or III).

Bundle Block Adjustment
Now, all 108 RMK DX images and 71 known high precision ground points are used for quality validation.The efficiency of the new method is also relatively compared to the commercial software LPS/ERDAS Imagine 2010.The latter provides automatic tie point measurement based on least-squared image matching and a default density of 5 x 5 standard tie points per image.Table 5 shows the number of N-fold tie points determined by the new method of SIFT-supported dense point cloud matching and by the LPS/ERDAS Imagine 2010.
Apparently, the new method can provide denser cloud of tie points than the default density of LPS/ERDAS Imagine 2010.These tie points are also first checked by means of free network adjustment, and the results are shown in Table 6, where the test value is the ratio of the a priori accuracy divided by posteriori accuracy.It shows that the new method provides the denser tie points with the photo coordinate accuracy xy σˆ= ±1.5µm = ±0.21pixel.The skip rate is 1.08%.
Then the bundle block adjustment with control data is performed, where 6 full control points and 65 independent check points shown in Figure 6 are used.The statistic figures of both bundle block adjustments using the tie points measured by the new method proposed in this paper and by the commercial software LPS 2010 are listed in Table 7.The root mean square value of ground coordinate differences on all 65 check points shows that the points determined by the new method have the horizontal accuracy ±3.4cm and the vertical accuracy ±11.9cm.
Moreover, they show that the new method is available to aerial triangulation with high precision and good efficiency.Especially, the new method don't need any information on image overlap.In other word, the new method don't need the well-known input data of block parameters and strip parameters which are often adopted by general commercial AT softwares.all key points.This method uses AFTP to estimate the overlap area, and to predict the location of searching window.In order to increase the computational efficiency, image pyramid with a top level image of about 700 x 700 pixels is adopted.The automatic extraction, selection and measurement of corresponding tie points is done efficiently, especially without the need on block parameters and strip parameters for providing priori knowledge on image overlap information.Moreover, the density of key points depends not only on the distance ratio threshold of SIFT but also on the amount of image information as well as image quality.
Tests are done by using 108 aerial images.They show that operation with both QF and AFTP provides better matching efficiency and best matching with lowest rate of skipped points, when distance ratio is set to the threshold less than 0.3.In general, the larger the distance ratio is, more points are matched per second, but the more the skipped points become.Due to the robustness of bundle block adjustment with data snooping operation, the skipped, namely inaccurate, points can be detected and eliminated.Furthermore, the accuracy of photo coordinates of image points extracted and matched by the proposed new method reaches a subpixel level and the accuracy of aerial triangulation is ±1.5µm ≈ ±0.21 pixels.The skip rate is 1.08%.
Also, the efficiency and benefit of AFTP and QF processing have been verified.These two pre-process procedures still can be further improved on efficiency and feasibility, especially the quality indicator for QF.Moreover, some extension versions of the semiglobal matching (SGM) will be further developed in the future to perform a robust pixelwise matching with quality figures.Their applications such as on high resolution true ortho image generation, high resolution and high precision digital surface model generation as well as 3D cyber city modelling with a good LOD (level of detail) will also be studied and developed.

Figure 1 .
Figure 1.Dense point cloud matching strategy Figure 2 illustrates the format of temporary tables of matched point connection.Every table of single image matching pair's result contains locations (row, column) of matched points on left and right image, denoted by (r L , c L ) and (r R , c R ).And the table of connection result stores location (r,c), point number (PN) and index value for every tie point in each image, which is done by means of location matching using the result table of image matching pair.The index value is used for descriptor inquiry, namely to inform that the descriptor belongs to the i-th keypoint on the j-th image.Therefore, the numbered tie points are connected, if their |∆r|<10 -6 pixels and |∆c|<10 -6 pixels, and the repeated measurements are eliminated at this step.

Figure 2 .
Figure 2. Connection of matched points where G rc = the gray value of the (r, c)-th pixel G = the average of gray values in a 15x15 window Assuming that the indicator values G std of all keypoints in one image are normally distributed, the threshold for the selection of those best key points will be set to their mean plus standard deviation of overall indicator values in one image.Thus, only about 16% key points are reserved for later matching.Apparently, QF uses a heuristic filtering step based on the standard deviation of gray-levels to throw away weak keypoints in uniformly distributed individual sub-images.Since the indicator value of QF is changeable and the threshold is adjustable, the goodness and availability of the setting will be verified by the tests.

Figure 6 .
Figure 6.Overlap of all 108 test images, and locations of 6 ground control points and 65 check points 3. TESTS

Figure 7 .
Figure 7. Image overlap determined by AFTP for the 11 test images Summarily, Table2shows the key point density for these 108 test images on different level of image pyramid, where a stricter threshold 0.2 for the distance ratio of SIFT is adopted to extract less number of best key points.Figure8illustrates the key point density on each pyramid level.Apparently, a higher level of image, namely a smaller scale of image, owns a denser cloud of key points.Generally speaking, the density of key points depends not only on the distance ratio threshold of SIFT but also on the amount of image information as well as image quality.For example, Figure9illustrates the locations of key

Figure 12 .
Figure 12.Average matching speed and skip rate for the cases I, II and III denoted by blue, yellowgreen and orange colour; solid and dashed lines denote average matching speed and skip rate, respectively

Table 1 .
Key point extraction for 11 images spent 6575 seconds.When distance ratio of AFTP is 0.2, there are 28 image matching pairs available, i.e. loop number is reduced from 55 to 28.The "distance ratio of AFTP" means the distance ratio of SIFT set to a stricter smaller threshold in order to select less number of best keypoints as input points to the AFTP.AFTP for these image pairs spent 116 seconds.QF for all key points on all 11 images spent 567 seconds.Figure7illustrates the image coverage determined automatically by the AFTP for these 11 images.It demonstrates an advantage of the proposed new method for dense point cloud matching and aero triangulation, namely without the need on image overlap information.

Table 1 .
Calculation time

Table 2 .
Key point density (unit: points / 1000pixels) for the 108 test images on different level of image pyramid

Table 3 .
Number and density of key points without and with QF Figure 11.Location of key points without QF (top) and with QF (bottom) for the image 10123 (left) and 60185 (right)

Table 4 .
Computational efficiency of AFTP and QF

Table 5 .
Number of N-fold tie points

Table 6 .
Results of free network adjustments done by the new method and LPS2010

Table 7 .
Results of bundle block adjustments done by the new method and LPS2010