OBLIQUE MULTI-CAMERA SYSTEMS - ORIENTATION AND DENSE MATCHING ISSUES

: The use of oblique imagery has become a standard for many civil and mapping applications, thanks to the development of airborne digital multi-camera systems, as proposed by many companies (Blomoblique, IGI, Leica, Midas, Pictometry, Vexcel/Microsoft, VisionMap, etc.). The indisputable virtue of oblique photography lies in its simplicity of interpretation and understanding for inexperienced users allowing their use of oblique images in very different applications, such as building detection and reconstruction, building structural damage classification, road land updating and administration services, etc. The paper reports an overview of the actual oblique commercial systems and presents a workflow for the automated orientation and dense matching of large image blocks. Perspectives, potentialities, pitfalls and suggestions for achieving satisfactory results are given. Tests performed on two datasets acquired with two multi-camera systems over urban areas are also reported.


INTRODUCTION
Oblique airborne multi-camera systems are becoming a standard sensor technology across a growing geospatial market, with multiple applications next to more standard vertical photography and its derivatives: DSM , orthophotos and maps.In line with automating the photogrammetric workflow, digital camera technology has levelled up and the definition of the 'large-format' has been pushed forward.Today we can observe sensors with resolutions of a half a billion pixels (cf.B660 frame camera), and Fritsch and Rothermel (2013) even claimed that the 'pixel race' in airborne sensor technology resembles that of consumer camera market.The first oblique airborne photography was taken by James Wallace Black in 1860, Boston US.Then, in the 1930's the U.S. Geological Survey and the U.S. Army Corps of Engineers used Fairchild T-3A five-lens film for mapping, surveillance and reconnaissance purposes.The concept of fitting several sensors to a single camera housing emerged again at the arrival of digital technology (Petrie, 2009a).It was the cost-effective solution to reach high spatial resolution without having to pay astronomic prices for it.However, the true revival of the oblique systems happened in year 2000 with the advent of Pictometry International and its patented system producing vertical and slant views, e.g.accessible in some bird's eye views of the M icrosoft Bing M aps.
The actual oblique camera systems come in a variety of configurations.Review and state-of-the-art of oblique systems are reported in (Karbo and Schroth, 2009;Petrie, 2009a;Lemmens, 2011).They differ in the sensors number, format, arrangement, mode of acquisition, spectral sensitivity, etc. (Table 1).Following the division of digital airborne systems presented in Petrie (2009a), one can distinguish between: Fan configuration: it extends the cross-track ground coverage and mainly comes as twin cameras, e.g.Trimble AIC x2 or Dual DigiCAM .Both systems are offered in a number of CCD chip sizes ranging from 22 to 60 M Pix, and interchangeable lenses up to 150mm.Two frames can be merged into one large frame in order to double the sensor size.M anufactured as modular systems, the sensors can be rearranged within the mounting to serve either as vertical or oblique system.A quiet innovative fan solution is offered by VisionM ap with the A3 sensor, a stepping frame system which captures up to 64 images per sweep, corresp onding to a field of view of 109°.With the employed 300mm focal length and highly advanced motion compensation procedures, the A3 system allows to fly low or high altitudes while keeping the Ground Sample Distance (GSD) large (Vilan and Gozes, 2013).Other systems with three or more fans are reported in (Petrie, 2009a).
"Maltese-cross" configuration: it consists of a single nadir camera and four cameras tilted towards cardinal directions by 40 -50°.This configuration is the most common and also the most diverse.It contains small-, medium-and large-format frame cameras.There are two observable development tendencies within the configuration: one is present among small-and medium-format size and it stresses the modularity concept and flexibility (IGI, M idas TRACK'AIR, Leica), the other produces 'closed' systems and invests into larger, often more powerful, sensor chips (M icrosoft, CICADE, Icaros).Hasselblad sensor respectively.These types of camera systems are not really oblique, the single cameras are near vertical -they don't show building façades -but they can be adapted to become oblique systems.The boom and large interest in oblique photography is owing to its primary quality: it reveals the building façades and, normally, footprints.Consequently, it becomes easier for non-expert users to interpret the data as it is more associative of what is seen from the ground.The possible applications based on oblique aerial views are multiple.To name a few, M ishra et al. ( 2008) use oblique images in road land updating, Lemmens et al. (2008) prove Pictometry feasible for building registration and preliminary parcel boundary determination while Xiao et al. (2012) address the building detection and reconstruction problem.M ass events' monitoring is presented in (Kurz et al., 2007, Grenzdoerfer et al., 2008).Last but not least, oblique systems are studied in the background of urban classification and 3D city modelling (Wang et al., 2008, Fritsch et al., 2012, Fritsch and Rothermel, 2013;Gerke and Xiao, 2013;Nex et al., 2013).Because the aforementioned applications are considered in metric space, they are likely only with accurate exterior orientations, up-to-date camera calibrations and a reliable DTM if single images are available (monoplotting).As long as quick calibrations and direct georeferencing may suffice for applications oriented towards qualitative evaluation (e.g.involving manual inspection), in more stringent cases (e.g.mapping involving dense image matching) optimally adjusted parameters are mandatory.To be able to use the craft of oblique imagery on a large scale, existing adjustment workflows must be correspondingly adapted.So far, a combined bundle adjustment including all cameras' parameters, carried out in an automatic fashion, is reported to be a difficult task, if not an unsolved task (Jacobsen, 2008, Gerke and Nyaruhuma, 2009, Wiedemann and M ore, 2012).Commercial solutions have also great problems to correctly handle aerial dataset composed of oblique and nadir images, with large scale and radiometric changes.
In this paper, we present a way to tackle hundreds or even thousands of oblique aerial images in a single bundle block adjustment.Then, the paper reports open issues in the context of dense image matching.In particular, potentialities, pitfalls and suggestions for achieving satisfactory results are given.

AUTOMATED IMAGE TRIANGULATION
Point reconstruction from two or more images is a function of camera calibration, exterior orientation and image measurement's quality.Inaccuracy of any of the above factors is echoed in the ultimate accuracy of a reconstructed object points, being even more pronounced in slanted views.The key figures, interior and exterior parameters, are nowadays often known a priori, as retrieved with a prior calibration procedure or measured directly with on-board sensors (GNSS/IM U), respectively.Nonetheless, these parameters are generally regarded as approximate if one has metric and automatic applications in mind, therefore an adjustment in a least squares sense is a must.For over a decade, many reliable photogrammetric workflows emerged with the ability to automatically find homologous points across multiple aerial views and use these observations in the final refinement stage -the bundle block adjustment.Popular software available for tie point extraction use standard areabased techniques (such as Normalized Cross Correlation and Least Square M atching) or feature-based methods (SIFT, SURF, etc.) coupled with robust estimators to remove possible wrong correspondences.These approaches were optimized for vertical aerial imagery, as this was the 'workhorse' of aerial photogrammetry, and in effect perform poorly on other geometries (Jacobsen, 2008).Oblique systems are richer in content when contrasted with nadir views.They unveil the so-far hidden façades and building footprints but in return they impose more severe occlusions as well as scale and illumination changes (Figure 2).The multi-view acquisitions should mitigate the problem, but again, it is traded for degradation in similarity between features, not to mention the additional portions of data that awaits post -processing.Thus it is evident that automatic tie point extraction procedures are now challenged by new and non-traditional aerial datasets.Our experiences have also seen that oblique images fed to traditional aerial photogrammetric software, in the same way vertical imagery is used, do not p roduce correct results.Hence, having learnt practices in convergent and unordered terrestrial image blocks (Barazzetti et al., 2010;Furukawa et al., 2010, Pierrot-Deseilligny andClery, 2011), the main obstacle to be overcame is the generation of putative correspondences between images and for that appropriate sets (pairs, triplets, etc.) of images should be matched against each other.By appropriate we mean that their similarity shall be maximized.In the following a way to reliably find accurate homologues points from large sets of oblique aerial imagery is described.

Image connectivity
The connectivity between images refers to a graph with nodes and edges being representations of images and their relationships (Figure 3-4).Two images are linked with an edge if and only if they are spatially compatible.A connectivity graph was also presented in Barazzetti el al. (2011) for faster and reliable tie point extraction in large terrestrial image blocks.For aerial blocks, GNSS/IM U data is a necessary input to find the connectivity between images of large datasets.A connectivity graph helps in speeding up the extraction of image correspondences and in reducing the number of possible outliers.There are three conditions to be fulfilled for an image pair to be regarded as compatible: A. Their ground footprints coincide by a given percent; B. Cameras' look directions are similar or one of the camera is nadir (similarity can be then violated); C. The number of extracted homologous points for the pair is above a given threshold.As deeply described in Rupnik et al. (2013), given a graph complying with conditions A and B, we allow feature extraction between images with at least two edges.Next, the edges of the graph are enriched with another attribute i.e. the number of extracted tie points.The developed image connectivity procedure is successful on image blocks acquired with both M altese-cross (M C) and fan (F) systems.While M C flies regularly along strips like traditional vertical imagery (Figure 3a), the sweeping sensor of F systems allows an arbitrary trajectory and more chaotic image footprint on the ground (Figure 4a).For these cases, rather than following nadir images along the flight trajectory as in M C systems (Figure 3), the mid-point of a ROI is chosen to connect the images in subsequent rings centred at the mid-point (Figure 4b).

Bundle adjustment
The bundle adjustment in a multi-camera system must handle n different cameras with different interior (IO) and exterior orientation (EO) parameters.The camera parameters can be retrieved without constraints -i.e. each image is oriented using an independent EO for each acquisition and a common (or independent) set of IO parameters for a given camera -or with additional constraints -i.e.equations describing the relative rotations and displacement between cameras are added to the mathematical model, lowering the number of unknowns and stabilizing the bundle solution (Rupnik et al., 2013).The nonlinearity of collinearity equations enforces the need of good initial approximation for the unknown parameters.M oreover, if there are many mismatches within the automatically generated tie points -often the case with oblique imagery -the system of equations is prone to divergence.Thus it is important to take every precaution in order to limit the amount of mismatches present in the observations and to have good initial approximations.Since years, in terrestrial applications, a subsequent concatenation of triangulation and resection (or DLT) procedures allows to work without initial approximations.This approach is now being used also in aerial photogrammetry.
In our experiments the Apero bundle adjustment software is normally employed (Pierrot-Deseilligny and Clery, 2011).Apero, starting from a set of image correspondences, (i) computes the approximate values of all unknowns, (ii) performs a relative bundle adjustment in an arbitrary coordinate frame, (iii) transforms the results to a desired coordinate frame and (iv) finalizes with a bundle adjustment for absolute geo-referencing.To make sure that the initial solution computed with direct methods (spatial resection, essential matrix) is coherent, a connectivity graph (Section 2.1) is used to help the tie point extraction procedure (and speed up the computational time).Joining images for the initial solution shall be understood as (i) finding triplets of images within the connectivity graph to be used for computing approximate orientations, and (ii) giving all the triplets a sequence in the concatenation order.Step (ii) is dependent on the acquisition system i.e. fan, block or M altesecross, while step (i) always remains the same.A bundle adjustment controlled in this way minimizes the risk of divergence by maximizing the similarity of images within particular triplets hence maximizing the ratio of good to bad matches, and ensuring the block's cohesion (Figure 5).

IMAGE MATCHING ON OBLIQUE IMAGES
In the last decade several matching algorithms have been developed (Zhang, 2005;Pierrot-Deseilligny and Paparoditis, 2006;Pons et al., 2007;Hirschmueller, 2008;Remondino et al., 2008;Furukawa and Ponce, 2010;Haala and Rothermel, 2012).Even if the methods differ from each other in terms of approaches and performances (Remondino et al., 2013), the most common and powerful methods are nowadays based on an energy minimization function that takes into consideration both a correlation (data term) and a regularization term (smoothing term) in order to enforce surface regularities and avoid mismatches.The results achieved by these methods on aerial nadir acquisitions have been so encouraging to allow several researchers to consider their point clouds comparable to the LiDAR ones (Haala, 2013).The achievable point clouds can indeed be very dense and highly accurate: images with a GSD in the order of 10 cm could theoretically be used to produce a dense point cloud with 100 points/m2.These point clouds are usually noisier than LiDAR and mismatches or wrong reconstructions can be still visible as a result of occlusions, shadowed areas or roof borders.Better results can be usually achieved increasing the number of overlapping images to allow a higher redundancy and a better filtering.But the results are still far from being complete, especially for narrow and shadowed roads (such as in most of the European historical city centres) or occluded regions.This kind of problems become more relevant when oblique images are used for the generation of dense point clouds.Indeed, compared to nadir images, oblique views are acquired from very different points of view and perspective distortions are more remarkable, so: -objects (building, roads, etc.) are captured with different scales; -the number of occluded areas normally increase due to the different looking direction; -the depth and image GSD change in a more sudden way compared to nadir images; -the smaller intersection angles and baseline between images (Gerke, 2009) makes the point cloud generation more sensible and increase even more the level of noise in the generated data; -the building façades are always tilted compared to the image planes often increasing the point cloud noise.To overcome some of these problems, higher overlap flights are recently performed by photogrammetric companies.But from an operative point of view, larger datasets produce a higher number of point clouds, which can be a very time consuming operation.Their successive visualization can be also very difficult with a normal PC as the number of extracted points is usually 2-3 times higher than point clouds produced from traditional nadir flights.

Point cloud generation experiences
Our experiences in dense point cloud generation from oblique datasets are primarily based on M icM ac (Pierrot-Deseilligny and Paparoditis, 2006).The matching algorithm works with master and slave images and it is able to produce dense point clouds for each pixel of the master image visible in the slave images.It is based on a multi-resolution pyramidal approach, starting from a rough scene's geometry estimation.
As oblique imagery has very different conditions compared to traditional nadir one, the matching algorithm must keep in count a higher depth of field, larger perspective deformations and image scale changes.An in-house tool was realized in order to identify the relevant images for dense matching (similarly to the aforementioned connectivity graph).Once a region of interest (ROI) is defined by a user, it is intersected with the footprints of the oriented images.The matching candidates are grouped according to their angle of incidence (looking direction) onto the ground.M aster images are then selected out of all the candidates within a given group.The master images are selected in order to cover the whole ROI and to assure an overlap between adjacent master images: an overlap of 30-40% is usually necessary in very dense urban areas to assure the complete reconstruction of the area.A set of slave images is defined considering the overlaps with each master image.Normally four sets of masters are considered to fully reconstruct a ROI (Figure 6).An entire building with its façades is normally reconstructed (if visible) using four point clouds -from four different looking directionswhich are then merged together to complete the building geometry.

Mixed pixels filtering and noise reduction
Due to the slant views and depth variations, several wrong object points (i.e.mixed pixels) can be generated in the reconstructed point clouds.These mixed pixels are usually disposed in the direction of the perspective rays, where object borders and shadows occur, influencing the correctness (as well as the visual quality ) of the achieved point cloud (Figure 7).These points are usually grouped in clusters and cannot be easily removed simply by analysing the proximity of each point to its neighbours.For this reason, exploiting the master image position and attitude, a filtering schema can be applied (Nex et al., 2013).Considering the direction between the master image perspective centre and each point in the space as well as the local surface directions in the point neighbourhood (red points in Figure 7a), erroneous points are filtered when the local surface of the point cloud has almost the same direction of the perspective ray.Despite these possible errors, the potentialities of oblique datasets to retrieve dense point clouds of urban areas are enormous (Figure 8).The point clouds have different levels of noise and completeness according to the texture of the modelled façade and the angle between the image and façade's plane.Façade with glasses can increase the level of noise and most of these parts are usually removed during the noise filtering. .Figure 8. Examples of dense point clouds derived from oblique aerial datasets.Complex façades with repeated patterns, uniform texture and large glass areas are leading to more noisy and unpleasant results (lower row).

CONCLUS IONS
As a response to the expanding airborne technology market and interest for mapping purposes, an overview of existing oblique multi-camera systems and processing methods has been presented.Among the many available commercial systems one can observe that M altese-cross configurations are probably taking the lead: they are offered in different resolutions, modular or fixed, with 2-10 camera heads, with and w/o forward, roll, vibration motion compensation (FM C, RM C, VC) algorithms, delivering individual or large-format (merged) imagery and acquiring data with different overlaps.The availability of oblique systems is very rich, but since the technology is almost new, it is still a question how to apply it, what are the strengths and weaknesses of particular platforms or which acquisition patterns are suitable for metric mapping.The post-processing of oblique images is also demanding and still an open research issue.The imaging geometry is analogous to close range applications, with all its characteristics: varying scale within the images, illumination changes, multiple viewing directions and large perspective differences between the views deny to find correspondences across views using traditional and well-established methods.But, as shown in the paper, if appropriate approaches are taken, it is possible to correctly triangulate large number of images of fan and M altese-cross systems.When it comes to dense image matching and point clouds generation, open issues remain related to the minimum overlap requested during the flights, the optimal number of reference images to be used, illumination and scale changes between overlapping images and mismatches due to depth variations.As reported by other researchers, the simple merging of point clouds produces blur and inhomogeneous radiometry, opening another problem that shall be addressed too.Advanced methodologies need to be developed or fine-tuned in order to mitigate all these effects and to improve the quality of the achieved point clouds.
Noteworthy oblique images provide for a deeper and more complete description of urban areas, allowing to extract more information in the 'smart city' domain.Integrating nadir and oblique views could enhance the knowledge derivation but so far the topic has not been thoroughly investigated and exploited.Also, the use of different spectral bands (e.g.thermal) from oblique views could push its use into a new spectrum of applications (e.g.heating loss monitoring of building façades).Last not least, albeit dense image matching algorithms generate excessive number of 3D points (counted in billions), visualization and processing are challenging with the standard PCs and leave space for research.A benchmark to evaluate oblique systems for 3D city modelling and mapping purposes is still missing.For this reason, thanks to a strong collaboration between EuroSDR and ISPRS, a benchmarking project is going to be established with the aim of creating a testfield composed of oblique and nadir aerial imagery, UAV and terrestrial images as well as ground truth data (GCPs, maps, 3D building models, etc.).

Figure 2 :
Figure 2: Same urban scene, acquired with a Midas 5 system, viewed from different perspectives (nadir and two oblique, respectively).

Figure 3 :
Figure 3: An example of image concatenation (Midas dataset): all perspective centers (a) and concateantion progresses (b-d) to derive the final connectivity graph of the image block (d).

Figure 5 .
Figure 5. Image orientation result s of large oblique block (VisionMap A3 dataset, 280 images) over an urban area (ca 0.5 x 0.5 km): camera poses retrieved without (a) and with connectivity graph (b, c).T he sparse point cloud of the extracted tie points (d).

Figure 6 .
Figure 6.Four sets of master images in the four looking directions.
Schema of the mixed pixels filtering (a).Practical example of the a point cloud generated from one looking direction (b) and the filtered one after the removal of the mixed pixels (c).