TOWARD AUTOMATED FAÇADE TEXTURE GENERATION FOR 3 D P HOTO-REALISTIC CITY MODELLING WITH SMARTPHONES OR TABLET PCS

An automated model-image fitting algorithm is propo sed in this paper for generating façade texture ima ge from pictures taken by smartphones or tablet PCs. The façade texture genera tion equires tremendous labour work and thus, has been the bottleneck of 3D photo-realistic city modelling. With advanced devel opments of the micro electro mechanical system (MEM S), camera, global positioning system (GPS), and gyroscope (G-sensors) can all be integrated into a smartphone or a table PC. These sensors bring the possibility of direct-georeferencing for the pictur es taken by smartphones or tablet PCs. Since the acc uracy of these sensors cannot compared to the surveying instruments, the image po sition and orientation derived from these sensors a re not capable of photogrammetric measurements. This paper adopted th e least-squares model-image fitting (LSMIF) algorit hm to iteratively improve the image’s exterior orientation. The image positio n from GPS and the image orientation from gyroscope are treated as the initial values. By fitting the projection of the wireframe m odel to the extracted edge pixels on image, the ima ge exterior orientation elements are solved when the optimal fitting achiev ed. With the exact exterior orientation elements, t he wireframe model of the building can be correctly projected on the image an d, therefore, the façade texture image can be extra cted from the picture.


INTRODUCTION
A photo-realistic 3D building model does not only describe the geometric information about the building but also represent its real appearance.There are a number of approaches for reconstructing the geometric model from photogrammetric images, from LiDAR point cloud, or from both of them (Braun, et al. 1995;Chapman, et al. 1992;Förstner, 1999;Grün, 2000, Lang andFörstner, 1996;Lowe, 1991;Tseng and Wang, 2003;Veldhuis, 1998;Wang and Tseng, 2009).However, the façade mapping relies on the manual operations to create texture images is still the bottle neck of the photo-realistic building modelling.The recent mobile computing devices, such as smartphones and tablet PCs, usually equip with not only highresolution camera but also built-in GPS receiver and G-sensors.These sensors can be used for the direct geo-referencing while taking pictures of buildings.This paper proposes a concept toward automated façade texture generation for the photorealistic 3D building modelling using smartphones or tablet PCs.When the picture is taken, the device's 3D coordinates are recorded from the built-in GPS receiver and its three rotation angles are also recorded from the G-sensors.However, these parameters are too rough to reconstruct the object space stereo model for photogrammetric purpose.Therefore, a model-image fitting algorithm based on least-squares adjustments is proposed to determine precise image orientation.
The reconstruction of photo-realistic 3D building models consists of three major issues: (1) modelling the object; (2) determining the image orientation; (3) creating the realistic texture image from photos.In this paper, the aerial photographs are used to reconstruct the geometric models of buildings, while the pictures taken by the personal computing device is used as the façade texture.By introducing the "Floating Model" concept, the object modelling and image orientation problem can be solved efficiently through the semi-automated procedures based on the Least-squares Model-image fitting (LSMIF).A friendly human-machine inter-acting interface program is designed for an operator to choose suitable model, and to move, to rotate, or to resize the model so it can approximately fit to all of the images.An ad-hoc Least-squares Model-image Fitting algorithm is developed to solve the optimal fitting between projected model line segments and extracted edge pixels.Since the object model can be extracted and the photo orientation can be determined, the creation of realistic texture image, which is also called inverse mapping, can be automated by coordinate transforming and image resampling.Figure 1 shows the workflow of the proposed photo-realistic 3D building modelling procedures.
In the proposed workflow, there are still two procedures -"model selection" and "approximately fitting" requires human interactions.This is because manual image interpretation is more robust and more efficient than computer algorithms.While the other computational work, such as "model projection", "precisely fitting", and "image clipping", are carried out by computer algorithms.Therefore, the proposed procedure shall improve the efficiency from the full-manual methods, while remain robust than full-automated approaches.

MODEL-IMAGE CORRESPONDENCE
To deal with the modelling problem, this paper adopted the concept of floating models (Wang, 2004).The floating models can be categorized into four types: point, linear feature, plane, or volumetric solid.Each type contains various primitive models for the practical needs.For example, the linear feature includes the line segment and the arc.The plane includes the rectangle, the circle, the ellipse, the triangle, the pentagon, etc.The volumetric solid includes the box, the gable-roof house, the cylinder, the cone, etc.Despite the variety in their shape, each primitive model commonly has a datum point, and is associated with a set of pose parameters and a set of shape parameters.The datum point and the pose parameter determine the position of the floating model in object space.It is adequate to use 3 translation parameters (dX, dY, dZ) to represent the position and 3 rotation parameters, tilt (t) around Y-axis, swing (s) around X-axis, and azimuth (α) around Z-axis to represent the rotation of a primitive model.Figure 2 shows four examples from each type of models with the change of the pose parameters.X'-Y'-Z' coordinate system defines the model space and X-Y-Z coordinate system defines the object space.The little pink sphere indicates the datum point of the model.The yellow primitive model is in the original position and pose, while the grey model depicts the position and pose after changing pose parameters (dX, dY, dZ, t, s, α).The model is "floating" in the space by controlling these pose parameters.The volume and shape of the model remain the same while the pose parameters change.The shape parameters describe the shape and size of the primitive model, e.g., a box has three shape parameters: width (w), length (l), and height (h).Changing the values of shape parameters elongates the primitive in the three dimensions, but still keeps its shape as a rectangular box.Various primitive may be associated with different shape parameters, e.g., a gable-roof house primitive has an additional shape parameter -roof's height (rh).Figure 3 shows three examples from each type of models with the change of shape parameters.The point is an exceptional case that does not have any shape parameters.The yellow one is the original model, while the grey one is the model after changing the shape parameters.The figure points out the other important characteristic of the floating model -the flexible shape with certain constraints.Changing the shape parameters does not affect the position or the pose of the model.

LEAST-SQUARES MODEL-IMAGE FITTING
The principle of model-image fitting algorithm is to adjust either model parameters or the image orientation parameters, so the model projection fit the building images.Since the floating model can be taken as a wire-frame model, the edge pixels are selected as fitting targets.The optimal fit is achieved by minimizing the sum of the perpendicular distances from the edge pixels to the corresponding projected line of the wireframe model.Either for geometric modeling or image orientation, an approximate fitting is required before applying the LSMIF algorithm.An interactive program is developed for model selection, approximate fitting, and visualization.To obtain as close as to the right fitting, this program provides a user interface that allows the operator to resize, rotate, and move a model to fit the corresponding building images approximately.Benefited from the approximate fitting, the LSMIF iteratively pulls the model to the optimal fit instead of blindly searching for the solution.To avoid the disturbance of irrelevant edge pixels, only those edge pixels distributed within the specified buffer zones will be used in the calculation of the fitting algorithm.Figure 6 depicts the extracted edge pixels T ijk and the buffer determined by a projected edge v i1 v i2 of the model.The suffix i represents the index of edge line, j represents the index of overlapped image, and k represents the index of the edge pixel.Filtering edge pixels with buffer is reasonable, because the discrepancies between the projected edges and the corresponding edge pixels should be small, as either the model parameters or the image orientation parameters are approximately known.The optimal fitting condition we are looking for is the projected model edge line exactly falls on the building edges in the images.In Eq.( 1), the distance d ijk represents a discrepancy between an edge pixel T ijk and its corresponding edge line v i1 v i2 , which is expected to be zero.Therefore, the objective of the fitting function is to minimize the squares sum of d ijk .Suppose a projected edge line is composed of the projected vertices v i1 (x i1 , y i1 ) and v i2 (x i2 , y i2 ), and there is an edge pixel T ijk (x ijk , y ijk ) located inside the buffer.The distance d ijk from the point T ijk to the edge v i1 v i2 can be formulated as the following equation:

Extracted Pixels P r o j e c t e d L i n e S e g m e n t B u f f e r
where i = the index of the edge line j = the index of the overlapped image k = the index of the edge pixel The photo coordinates v i1 (x i1 , y i1 ) and v i2 (x i2 , y i2 ) are functions of the unknown model parameters, comparatively the exteriororientation parameters of photos are known.Therefore, d ijk will be a function of the model parameters.Taking a box model for instance, d ijk will be a function of w, l, h, α, dX, dY, and dZ, with the hypothesis that a normal building rarely has a tilt angle (t) or swing angle (s).The least-squares solution for the unknown parameters can be expressed as: ( w, l, h, α, dX, dY, dZ)] 2 → min. (2) Eq.( 2) is a nonlinear function with regard to the unknowns, so that the Newton's method is applied to solve for the unknowns.The nonlinear function is differentiated with respect to the unknowns and becomes a linear function with regard to the increments of the unknowns as follows: in which, F ijk0 is the approximation of the function F ijk , calculated with given approximations of the unknown parameters.Given a set of unknown approximations, the leastsquares solution for the unknown increments can be obtained, and the approximations are updated by the increments.Repeating this calculation, the unknown parameters can be solved iteratively.Eq.( 2) and Eq.( 3) are used for geometric model reconstruction.As for image orientation determination, they are modified as Eq.( 4) and Eq.( 5).The unknowns turn to the increments of the image orientation parameters.
The linearized equations can be expressed as a matrix form: V=AX-L, where A is the matrix of partial derivatives; X is the vector of the increments; L is the vector of approximations; and V is the vector of residuals.The objective function actually can be expressed as q=V T V→min.For each iteration, X can be solved by the matrix operation: X=(A T A) -1 A T L. The iteration normally will converge to the correct answer.However, inadequate relevant image features, affected by irrelevant features or noise, or given bad initial approximations may lead the computation to a wrong answer.

CONCLUSIONS
Photo-realistic 3D building models are the basic geospatial information infrastructure for many applications.This paper proposed a concept toward automated texture generation based on least-squares model-image fitting algorithm to overcome the bottleneck.Instead of using the precise and expensive mobile mapping instruments, the personal mobile computing devices are used to collect façade images of the buildings.Benefit from the built-in GPS receiver and G-sensors, the approximate image orientation parameters are directly recorded as the picture was taken.Then the orientation is refined by fitting model to image iteratively.Some experiments are still undergoing, so the results will be presented in the conference in an interactive way.

Figure 3 :
Figure 2: Pose parameters adjustment of floating models

Figure 6 :
Figure 6: Extracted edge pixels and buffer