AUTOMATING THE PHOTOGRAMMETRIC BRIDGING BASED ON MMS IMAGE SEQUENCE PROCESSING

: The photogrammetric bridging or traverse is a special bundle block adjustment (BBA) for connecting a sequence of stereo-pairs and of determining the exterior orientation parameters (EOP). An object point must be imaged in more than one stereo-pair. In each stereo-pair the distance ratio between an object and its corresponding image point varies significantly. We propose to automate the photogrammetric bridging based on a fully automatic extraction of homologous points in stereo-pairs and on an arbitrary Cartesian datum to refer the EOP and tie points. The technique uses SIFT algorithm and the keypoint matching is given by similarity descriptors of each keypoint based on the smallest distance. All the matched points are used as tie points. The technique was applied initially to two pairs. The block formed by four images was treated by BBA. The process follows up to the end of the sequence and it is semiautomatic because each block is processed independently and the transition from one block to the next depends on the operator. Besides four image blocks (two pairs), we experimented other arrangements with block sizes of six, eight, and up to twenty images (respectively, three, four, five and up to ten bases). After the whole image pairs sequence had sequentially been adjusted in each experiment, a simultaneous BBA was run so to estimate the EOP set of each image. The results for classical (“normal case”) pairs were analyzed based on standard statistics regularly applied to phototriangulation, and they show figures to validate the process.


INTRODUCTION
Generally, a land-based mobile mapping system (MMS) has sensors to provide full orientation (position and attitude) for the stereo image pair sequence.Integrating positional and navigational data from satellite and inertial systems a solution to the image orientation problem is guaranteed (Tao, Li, 2007).While driving along streets and roads, the MMS acquires sequences of stereoscopic image pairs when two parallel cameras are pointed forward or backward the vehicle (Habib, 1998).With the outer orientation given by the sensors, the object points, along and aside the roads, selected in the image pairs, can be mapped by photogrammetric intersection (Silva et al., 2003).
Although such integration has been the standard method in MMS, a few researchers are developing a methodology entirely based only on image processing which means completely relied only on data extracted from the road image sequence to work out the image orientation problem in order to provide either an alternative or a complement to the satellite-inertial data integration system (Tao et al., 2001).In the realm of photogrammetry, we have proposed a technique called photogrammetric traverse (Silva et al., 1998) which is a special bundle block adjustment (BBA) for connecting a sequence of stereo-pairs and determining the exterior orientation parameters (EOP).
Of course, as no real external data are considered, we need to provide an arbitrary datum such as a local Cartesian reference frame.We did it in a semi-automatic strategy based on optical flow to map conjugate points in the imagery domain along the sequence so they could be used as tie points (Barbosa et al., 2007).
We propose now to automate the photogrammetric bridging based on a fully automatic extraction of homologous points in stereo pair sequence to refer the EOP and tie points to the arbitrary object space Cartesian datum.The hypothesis is that we can have the orientation of the whole image sequence estimated with respect to the local reference system.In the following sections the detailed methodology, experiments, results, and analysis are presented and discussed in order to verify the alluded hypothesis.

BACKGROUND THEORY
In general both cameras and images require orientation with respect to a Cartesian reference frame (CRF).

CRF for Image Coordinates:
An internal frame is defined inside the camera and it is used for referencing the interior orientation parameters (IOP), such as, focal length, principal point, lens distortion, and affinity parameters.Considering that the pair of cameras is mounted on a rigid metal structure so both optical axes are approximately horizontal and parallel to each other.In each camera the x-axis points to the right, the y-axis is upward, and the z-axis coincides with the optical axis pointing to the image sensor inside the camera, what defines the so called photogrammetric coordinate system (PCS) whose origin as usual coincides with the center of the image.Calibration procedures estimate the IOP of each camera and also the relative orientation parameters (ROP) between the pair of cameras.

CRF for Object Coordinates:
The external orientation parameters (EOP) are referred in general to a CRF realized on the object.In this particular situation a local CRF is arbitrarily established by the camera pair system (CPS).Considering an observer behind the cameras, the CRF origin is set to the perspective center (PC) of the left camera; the stereo basis materializes the X-axis pointing to the PC of the right camera, Y-axis is upward, and Z-axis coincides with the optical axis pointing to the image sensor inside the camera or in other words points to the observer behind the CPS.
Both image and object Cartesian coordinates are then righthanded reference systems.Rotating angles between the corresponding image and object reference systems are positive counter clockwise.Both reference systems are related to each other by scale, translation, and rotation, and they are approximately parallel.

Image Navigation and Orientation.
Satellites for positioning constitute the currently called Global Navigation Satellite System (GNSS) that provides global coordinates and navigation parameters for many applications.Inertial sensors such as accelerometers and gyroscopes sense the specific force on a rigid body and its axis angular rates, respectively.Specific force and angular rate are suitably processed by the Kalman filter so we can estimate the position, velocity, and attitude (PVA) of the rigid body.Currently the integrated solution of satellite and inertial data provides an efficient method to orient the cameras and their products such as the image pairs (Rouzaud, Skaloud, 2011).For non-real time photogrammetric purposes, usually, the P is computed from satellite data and the A is from the inertial.
Not always we can have the combined satellite and inertial PVA estimates.Indoor images and dense urban street images, for instance, may not have the EOP computed from PVA as these estimates may fail due to the lack of satellite data and the inertial data accumulates noises because of high frequency rates, temperature, and the like.Some computer vision and also mapping researchers started experiencing on techniques to orient images based only on image processing or combining it with other technologies.(Randeniya et al., 2008) presented the calibration results of an integrated visual and inertial system which is a passive technique suitable for indoor environment.Using only data from the visual system applied to an outdoor application, the authors obtained stimulating results for camera orientation.(Li and Sclaroff, 2007) published a solution that uses a stereo camera, optical flow, and correspondence to estimate the three dimensional movement fields for the objects in the scenes, and (Giachetti et al., 1998) for road navigation.(Griessbach et al., 2010) presented a multi sensor approach for realization of the orientation task in combination with an optimal fusion of the measurements that provides the best estimation accuracy of parameters.Depending on the applications the main inertial sensors will be combined with one or more other specific sensors like optical sensor (stereo cameras), GPS and others.The main investigation points are the development of the methods and algorithms for complex state estimation and their implementation in a real-time software and hardware solution, which serves as a base navigation system for different applications, for example for indoor navigation, driver assistance for vehicles and trains or the estimation of exterior orientation for airborne and space borne camera systems and attitude control for small satellites.(Amami et al., 2014) produced a personal MMS based mainly on cameras and use of low cost GNSS and inertial sensors to provide a bundle adjustment solution with initial values.The system has been tested indoors and outdoors with different GPS coverage, surrounded features, and narrow and curvy paths.Tests showed that the system is able to work in such environments providing 3D coordinates of better than 10 cm accuracy.(Silva et al., 2007) and (Barbosa et al., 2007) implemented a technique to orient the image pairs based only on image processing and photogrammetric techniques without any other sensor.The solution was based on the vehicle velocity estimated from the dense optical flow computation.(Veth, 2011) synthetized the techniques and the progress related to navigation fully relied on images.He grouped the techniques in two classes of methods: one based on optical flow and the other on feature tracking.Both methods use the apparent movement of image regions among the sequential frames to determine the relative movement of a camera.Shortly every image based technique for positioning requires three basic operations: finding suitable regions in the images, corresponding regions to establish homologous points between them, and estimating the EOP.Each operation has its own drawbacks that are grouped in scientific and technologic problems such as, respectively, interest area selection, correspondence, and pose estimation.In photogrammetry, these operations are denominated: point or feature extraction, correspondence (correlation), and POE estimation.(Roncella et al., 2005) used the term photogrammetric bridging.(Silva et al., 1998(Silva et al., , 1999) ) had proposed photogrammetric traverse.It is a special BBA for connecting a sequence of terrestrial stereo-image (horizontal) pairs taken along the vehicle trajectory so that stereoscopic basis is approximately positioned perpendicular to the main surveying axis or that the optical axes are approximately parallel to the mentioned surveying axis.Of course, the BBA determines the exterior orientation parameters (EOP).

Photogrammetric Bridging
Figure 1 illustrates the concept where L i R i represent the stereoscopic bases at instants i. Considering L for left and R for right, the positions where the images, i.e., the PC, are taken are represented by L 1 and R 1 at instant t 1 and so on for the sequential instants t i .The object point P is situated between the third and the fourth bases and it appears in the images of bases 1, 2 and 3. Its spatial position (X,Y,Z) with respect to the local and arbitrary CRF (LACRF) can be estimated by simple intersection with third pair data or double intersection with second and third bases data or yet by triple intersection of the six images.By generalizing this principle we reach the concept of phototriangulation with the particularity that in such case the scale varies severely in each image.This is due to the range Y P of the object point P to the bases varies from a few meters to dozens ahead.

SIFT Algorithm
The algorithm was introduced by (Lowe, 1999) and it is based on earlier contributions from many authors such as (Moravec, 1981) and (Brown;Lowe, 2002).The Scale Invariant Feature Transform (SIFT) algorithm is a set of four main steps that filters successively an image so it extracts features that are invariant to image scaling and rotation, and partially invariant to change in illumination and camera orientation.We suppose that when using images of a sequence the characteristics of invariance are kept especially in the high contrast regions of the images where it is possible to extract edges in general and points in particular.
The four steps were described by (Lowe, 2004): 1. Scale-space extrema detection which is the first stage of computation that searches over all scales and image locations.It uses a difference-of-Gaussian function to identify potential interest points that are invariant to scale and orientation; 2. Keypoint localization details a model to determine location and scale at each candidate location.Keypoints are selected based on measures of their stability; 3. The orientation assignment step attributes to each keypoint one or more orientation based on local image gradient directions.All future operations are performed on image data that has been transformed relatively to the assigned orientation, scale, and location for each feature, thereby providing invariance to these transformations; 4. Keypoint descriptor closes the process when the local image gradients are measured at the selected scale in the region around each keypoint.These are transformed into a representation that allows for significant levels of local shape distortion and change in illumination.
In the realm of image orientation this process searches objects as points in the images.SIFT extracts a large amount of keypointsthe so called clouds of pointsso that the consequence of an eventual error due to the scene variation is reduced.At the end of the process, the algorithm produces four attributes for each keypoint in a feature vector: x and y coordinates, magnitude, and orientation.

AUTOMATING THE PROCESS
The total sequence of the frontal images is processed sequentially by forming partial blocks of four, six, eight, or more images, that is to say partial blocks of two, three, four, or more image pairs as we will present in the coming sections.The photogrammetric-based orientation process demands automation.
It begins with the SIFT-based determination of a cloud of points from each image of a partial block.The correspondences among cloud points are determined by a correlation technique.The image coordinates of all corresponding points are transformed to photo-coordinates.EOP are set arbitrarily to the first pair in order to establish a LACRF.The object space coordinates are approximated by photogrammetric intersection from EOP and photo-coordinates using the inverse collinearity equations.The block of four, six, eight, or more images are then phototriangulated.The process is repeated sequentially until the last base or pair is reached and all EOP of each image is estimated.This is called photogrammetric traverse or bridging.
The Matlab tools (v.7.0) was used to implement the computational solution which is composed of SIFT algorithm, matching, intersection, and phototriangulation (BBA) besides using existing software and libraries to construct and visualize images and graphics.

Point Clouds by SIFT Algorithm
The relative position among objects in the scene is an important feature when we want to recognize objects.The locations are not expected to change significantly from an image to another one in the vicinity as it is the case of a sequence, otherwise many errors may occur.In this project the SIFT technique was applied to extract points with common characteristics among a block of images, instead of object recognition.
The amount of cloud points is directly proportional to the size of the image and to the threshold related to the difference between the first and second least distances considering the feature vectors.Consequently, so it is the computing time.Around 12 to 15 thousand of keypoints were found in each of the first four images of size 1920 x 1280 and around six to seven hundred in the 80% reduced size images (384 x 256).The images were reduced by bicubic interpolation.

Manifold Corresponding Points
Considering a block of four images (two pairs) such as L 1 -R 1 form the pair 1, at epoch t k , and L 2 -R 2 form the pair 2, at epoch t k+1 .SIFT was applied to each image so it extracted four distinct point clouds.Taking the pair L 1 -R 1 , the homologous points were found based on the minimum Euclidian distance of their feature vectors in the vicinity of every point of the left image (L 1 ), that is to say, the closest neighbor.Taking now the two left images (L 1 and L 2 ) of the first and second pairs, the homologous points of the first pair left image (L 1 ) were correlated with the points of the second pair left image (L 2 ) based on the same criteria resulting in a set of forward homologous points.The procedure was repeated for L 2 -R 2 , and finally for R 1 -R 2 , so the correspondence technique was applied twice for the lateral (stereo) and twice for forward images.
The higher the threshold the larger is the quantity of the correspondences among the keypoints.Higher threshold than 0.6 may cause false matchings.That happened when using 0.8 for the original size images that produced 1862 four-fold keypoint correspondences (only 68 for the 80% reduced images).The threshold of 0.3 gave 89 four-fold correspondences for the original size and only 5 matched points for the 80% reduced.
As the partial blocks of the whole sequence of frontal terrestrial images may have more than two bases, SIFT algorithm is applied to all images of every partial sequence.Accordingly to the procedure above, the homologous or corresponding points are automatically determined by four, six, eight or ten-fold image matching process.These points are then treated as tie points whose planar image coordinates (r,c) of the manifold correlated points are transformed to PCS (f, x 0 , y 0 ).All these selected points will be used as tie points in the partial block or sequence.

Tie Points Terrain Coordinates by Photogrammetric Intersection
In order to automate the photogrammetric bridging process, the tie points must have their approximate object space coordinates computed by photogrammetric intersection.Working out the collinearity equations we can write the linearized and modified pseudo-observation equations based on the arbitrary EOP and tie point photo-coordinates after some algebraic manipulation.A linear least squares adjustment estimates the corresponding XYZ tie point coordinates without any approximation by solving a system (eq 1) of 3 x 3 normal equations (Silva et al, 2003).
The vector x t contains the three XYZ coordinates to be used as approximate values in the BBA of the partial sequence.The design matrix A will range from 8 x 3 (double intersection) up to 20 x 3 elements (five-fold intersection).The first two rows of the design matrix A and the pseudo-observation vector (l b ) are expressed as they follow, respectively: Of course the row dimensions of A and l b will vary from 8 to 20 accordingly to the four-fold up to ten-fold intersection case.

Special BBA
This is a special use of BBA because the optical axes are approximately parallel to the pavement and the bases move forward so a feature is imaged at different scales.It is a phototriangulation of terrestrial and frontal images as the photo scale varies significantly in a stereopair and along the sequence.
Together with some of the bases EOP, the base length BL (eq 4), and the base advancement BA (eq 5) were also constrained. (4) where: one prime (') and two prime (") variables are, respectively, the left and the right 3D PC referred to LACRF; k and k+ refer to epoch t k and t k+ .
Collinearity equations were the analytical math model, the indirect observation equation was the least squares estimation method, and the unit weight matrix synthetized the simplified stochastic model.Additionally we constrained the IOP calibrated elements, the first pair EOP of the sequence, and the BL and BA.The general solution is the equation ( 6): where: A is de regular design matrix, P is the unit weight matrix related to the observations (photocoordinates), Pc is the weight matrix referred to the constraints, C is the "constrained design matrix", l and l c are the vectors expressing the difference between computed and observed (l) and the pseudo-observed (l c ) values, respectively, along the iteration.
The a posteriori unit variance estimated the general behavior of all variables as a reference for the global quality of the least square adjustment.The very well-known equation ( 7) is where: v and v c are the residual and pseudo-residual vectors, respectively; n and n c are the number of observations and pseudo-observations (constraints), and u the total number of unknowns (corrections to approximate parameters) to be estimated.
Considering a partial or the whole sequence, which is a partial or the full bundle block, accordingly, n is calculated by equation ( 8): where tF is the total number of images (photos) in the block (partial or full), and nTP is the number of tie points.
The constraints, n c , is calculated by equation ( 9): where EOP is 6, IOP is equal 4 (f, x 0 ,y 0 ,k 1 ); nBL is the number of bases; and nBA the number of constraints related to base advances.
Finally, u is the total number of unknowns (eq 10): Concerning to the stochastic model, we can choose at least four distinct methods to assign values for the P (weight) matrix.
According to the relative position of the tie point in the image plane, P can be assigned as a function of the distance to the center; or of the parallax of the conjugate tie points; or yet the identity matrix (P = I); or finally a diagonal matrix P = 1 / (pixel_size) 2 the one that we in fact used in the project.

Sequential EOP Estimation by Photogrammetric
Bridging: Every small block of a full sequence acts as a partial block that is solved by BBA of four to more bundles.This technique has been used to map streets and roads (Silva et al., 2000(Silva et al., , 2003)).We can start partial blocks of two bases; three bases (six images) considering two approaches: no base skipped such as we can have eight blocks (bases 1-2-3, 2-3-4 and so on until 8-9-10); one base skipped so we count four small blocks (bases 1-2-3, 3-4-5, 5-6-7, and 7-8-9).Table 3 (Section 4, Experiments) summarizes more combinations.

BBA for the Whole Sequence:
After the whole sequence has been covered by the partial blocks all EOP of each image were estimated through a sequential adjustment technique.In this project, it follows the BBA of the whole sequence in order to homogenize all EOP estimates.Here it was adapted to work in a semiautomatic manner.

Preparation
Two cameras Cannon T3i 600D were mounted on a rigid bar to form a stereo-pair with a base of 0.4m.An electronic device synchronized the shots.The original image size is 1920 x 1280 whose correspondent pixel size is 0.0115mm and 0.023mm for the 50% reduced image.
The calibration parameters of the camera system were estimated by the software CCv2011.05(Galo, 2013) to estimate interior and relative parameters using the original image pairs taken in front of a 3D well determined coordinates target wall.
Ten bases were marked on the street pavement every one meter approximately to control de experiments.The stereo-camera was ported by a human operator on the marks.The sequential image pairs were taken along the sequence and saved as JPG format.
The LACRF is centered in the first pair left camera.A selection of 18 tie points in the first pair had their XYZ coordinates determined by topographic surveying using a total station.The first pair EOP was estimated by photogrammetric resection and they were constrained in the photogrammetric bridging process.

Tie point extraction and matching by SIFT
Considering the first two pairs (Figure 2), we continued to experiment with SIFT algorithm (initial tests are not reported in this article) to check the amount of the extracted key-points in the image plane (Table 1).
The amount of extracted key-points depends directly on the threshold value.Besides that the quantity extracted from forward pairs (1-3; 2-4) was larger than those from stereo-pairs (1-2; 3-4).Considering now all images of the sequence -20the average number of extracted points was 3702 (the minimum 3289 in image 6 and the maximum 4513 in the image 12).

Arrangements of the experiments
Table 3 summarizes the experimented arrangements.Its first column identifies the experiment, B and I are the number of stereo-bases and images, accordingly, SB is the number of skipped bases, PB the amount of partial (sequential) blocks in the sequential BBA that is to say the photogrammetric bridging, and ATP the average amount of tie points used in PB.Skipped bases are associated to the progress in the photogrammetric bridging.Exp 1 adjusted sequentially 9 partial blocks so that images 1 to 4 formed the first partial block, 3 to 6 the second, and so on until the last block was formed by images 17 to 20 with the largest average number of tie points.Exp 2 progressed with partial blocks formed by 6 images (1 to 6, then 3 to 8, 5-10, 7-12, and so on up to the last partial block formed by images 15 to 20).Exp 3 also used a partial block of six images but advancing such as 1 to 6, 5 to 10, 9 to 14, and 13 to 18. Exp 9 formed two partial blocks of five bases (10 images): 1 to 10 and 9 to 18 with only 15 average tie points.For better EOP estimation the distribution of tie points in the image plane must be balanced.We depicted the image plane in nine sectors (Figure 3) to count the amount of tie points in each sector (Table 4).A normalized index was computed based on the quantity of tie points and on weights (3, 2 or 1) according to the sector.The ideal index value would be 1.0 expressing that all tie points would be in the corner sectors.Low index expresses an unfavorable condition for estimating EOP by BBA.We opted to present and discuss the results related to the extremes configurations (Exp 1 and Exp 9) and one in the middle (Exp 5).The comparison let to the discrepancies and from them to the absolute error (Figure 4) and to the root mean square (rms) error (Figure 5) for Z C coordinates as shown in Table 6.The experiments had no control or check for the attitude parameters.The average standard deviation was computed for all oriented images (Table 7).The analyses of the absolute and rms errors and also of the average standard deviation of the attitude parameters lead to the Exp 1 as the one that had the least error propagation as expected from the index 0.70 showed in table 4. Figure 6 and 7 are plots of the adjusted perspective centers corresponding the Exp 1.The initial goal was to reach full automation to execute the entire project from the tie point extraction to the complete BBA.However, human intervention is still need because decision issues related to outlier elimination, weight matrix balancing, integrating of sequential (bridging) to the complete sequence photo-triangulation to mention the most demanding at this point.

BBA RESULTS AND ANALYSES
The coordinates of all PC had lower error propagation than for the attitude parameters.The approximately 1.0m constraint to the base advance may explain this.
Figure 7 -Orthogonal plot of all PC estimated positions in the LACRF (Exp 1).

CONCLUSION
SIFT algorithm adapted well to this kind of stereo-pair sequence especially for the changing scale forward pairs.With the same algorithm we were able to automatize the tie point extraction and matching along the whole sequence.The amount of tie points obtained from the extracted key-points was sufficient for the BBA although in some images they are not uniformly distributed.
A fully automatic process is still on demand.In our particular application we need to automatize the photogrammetric process for the whole sequence when going from the bridging to the simultaneous and final adjustment.
Although a solution has been reached so that the hypothesis has not been denied, a complete confident result still depends on external data as we simulated an odometer measuring 1 m between the sequential bases.Therefore we also constrained the basis length besides the 1 m bases advancement.Both were necessary to avoid higher drift and torsion along the street image sequence.

FUTURE AND CONTINUED RESEARCH
Although SIFT algorithm has demonstrated efficiency to extract a large amount of key-points that are then corresponded to become tie points, the quality of such determination needs to be evaluated in this particular application.
Besides the constraints related to base length and base advances proved to be effective to control the error propagation, it will be valuable if we can test and prove the contribution of low cost inertial sensors to help the attitude parameter estimation.
An algorithm to detect and localize outliers must be inserted in the photogrammetric chain.
The full photogrammetric process must be automatized so no human interaction is demanded.Decision rules based on statistical analyses may contribute to compose the weight matrix as function of parallax or Euclidean distance or other criteria.
New camera arrangements such as three cameras in line (collinear) forming a double stereo camera and three coplanar cameras are under construction to be tested.The coplanar arrangement means that the central camera in the collinear arrangement is lifted so the three PC makes a triangle such as all optical axes are parallel.
Another idea to be developed is based on the use of omnidirectional cameras so that a sequence of panoramic views may be investigated in the similar way we explored in this project.

Figure 1 -
Figure 1 -Photogrammetric bridging or traverseconceptualillustration(Silva, 1998) (r 11 X c +r 12 Y c +r 13 Z c ), J = (r 21 X c +r 22 Y c +r 23 Z c ), K = (r 31 X c +r 32 Y c +r 33 Z c ); (x,y;f) are the photocoordinates and the calibrated focal length, respectively; (x 0 ,y 0 ) are the principal point coordinates; {X C ,Y C ,Z C } are the object space PC coordinates; and r ij the elements of the Eulerian rotation matrix.
Nine arrangements of the partial (sequential) blocks and the corresponding average amount of tie points.

Figure 3 -
Figure 3 -Image plane sectors: U for upper, C for center, B for bottom, L for left, and R for right.

Figure 4 -
Figure 4 -Z C absolute error at each PC.

Figure 5 -
Figure 5 -Z C rms error at each PC.

Figure 6 -
Figure 6 -Estimated positions of the PC plotted in the LACRF after BBA Exp 1.

Table 2 -
Table2shows the results for threshold equal 0.6 and 50% reduced images (960 x 640 pixels), the same values used for all experiments.This confirms that SIFT is more efficient to match key-points from different scale forward images than similar scale lateral (stereoscopic) images, as declared by its creator.Matched key-points from forward and lateral pairs.

Table 4 -
Average amount of tie points considering all images of the sequence and the average indices for the experiments.
Table5presents all LACRF coordinates for the 20 PC estimated by BBA related to the experiments 1, 5 and 9.The only check condition is to compare the estimated Z C coordinates to those approximately expected (multiple of one meter).

Table 5 -
Estimated terrain coordinates by the simultaneous BBA.

Table 6 -
Absolute and rms errors for the Z C .

Table 7 -
Estimated average standard deviation for the attitude parameters.