AN AUTOMATIC 3D RECONSTRUCTION METHOD BASED ON MULTI-VIEW STEREO VISION FOR THE MOGAO GROTTOES

This paper presents an automatic three-dimensional reconstruction method based on multi-view stereo vision for the Mogao Grottoes. 3D digitization technique has been used in cultural heritage conservation and replication over the past decade, especially the methods based on binocular stereo vision. However, mismatched points are inevitable in traditional binocular stereo matching due to repeatable or similar features of binocular images. In order to reduce the probability of mismatching greatly and improve the measure precision, a portable four-camera photographic measurement system is used for 3D modelling of a scene. Four cameras of the measurement system form six binocular systems with baselines of different lengths to add extra matching constraints and offer multiple measurements. Matching error based on epipolar constraint is introduced to remove the mismatched points. Finally, an accurate point cloud can be generated by multi-images matching and sub-pixel interpolation. Delaunay triangulation and texture mapping are performed to obtain the 3D model of a scene. The method has been tested on 3D reconstruction several scenes of the Mogao Grottoes and good results verify the effectiveness of the method. * Corresponding author


INTRODUCTION
With the rapid development of computer technology and sensing technology, three-dimensional (3D) digitization of objects has attracted more and more attention over the past decades.3D modelling technology has been widely applied in various digitization fields, especially cultural heritage conservation.3D digitization of cultural heritage is mainly used for digital recording and replication of cultural heritage.Considering the precious value of cultural heritage objects, non-contact and non-destructive measure approaches are generally taken to acquire 3D models.For realistic application, automatic, fast and low-cost 3D reconstruction methods with high precision are required.
A number of active and passive technologies (Pavlidis et al., 2007) are developed for 3D digitization of cultural heritage.Laser scanning methods (Huang et al., 2013) and structured light methods (Zhang et al., 2011) are typical active methods.The most significant advantage of laser scanning is high accuracy in geometry measurements.Nevertheless, the models reconstructed by laser scanning usually lack good texture and such devices have high cost.As passive methods, vision-based methods have the ability to capture both geometry information and texture information, requiring less expensive devices.According to the amount of cameras used, vision-based methods are divided into monocular vision, binocular vision and multi-view vision.Monocular vision methods can obtain depth information from two-dimensional characteristics of a single image or multiple images from a single view (Massot and Hérault, 2008;Haro and Pardàs, 2010).Such methods are usually not very robust to the environment.Moreover, monocular vision methods can gain 3D information from a sequence of images from different views (shape from motion, SFM) (Chen et al., 2012).The SFM method has a high time cost and space cost.Binocular vision method can acquire 3D geometry information from a pair of images captured from two known position and angles.This method has high automation and stability in reconstruction.But this method easily leads to mismatched points due to repeatable or similar features of binocular images (Scharstein and Szeliski, 2002).In order to reduce the possibility of mismatching, 3D measurement systems based on multi-view vision have been developed (Setti et al., 2012).Generally, the systems have complex structure.This paper presents an automatic 3D reconstruction method based on multi-view stereo vision.This method has reconstructed 3D models of several scenes of No.172 cave in the Mogao Grottoes using a portable four-camera photographic measurement system (PFPMS) (Zhong and Liu, 2012).The PFPMS is composed of four cameras to add extra matching constraints and offer redundant measurement, resulting in reducing the possibility of mismatching and improving measure accuracy relative to traditional binocular systems.

3D RECONSTRUCTION METHODOLOGY
The authors take reconstruction of a scene of a wall for example to illustrate the whole process of 3D reconstruction, including multi-view images acquisition, multi-view image processing, triangulation and texture mapping.

Multi-view images acquisition
As the main hardware system, the PFPMS consists of four cameras with the same configuration parameters, which observes the target object at a distance of 2.0-5.0 m.Four The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-4/W5, 2015 Indoor-Outdoor Seamless Modelling, Mapping and Navigation, 21-22 May 2015, Tokyo, Japan images with a high image resolution of 6016×4000 can be captured synchronously by a button controller connected to the switch of shutters of the four cameras.The overall structure of the PFPMS is similar to a common binocular vision system and the difference is that two cameras with upper-lower distribution are substitute for each camera of a binocular system respectively, as shown in Figure 1.The four cameras have rectangular distribution and their optical axes are parallel to each other to minimize the impact of perspective distortion on feature matching.The distance between the left or right cameras is about 15 cm and the distance between the upper or lower cameras is about 75 cm.On the one hand, the baseline between the left two cameras or the right two cameras is short.As a result, the very small difference between the two images captured by them can help improve accuracy of matching.Furthermore, the left cameras and the right cameras can form four binocular systems with long baseline to calculate space position of feature points.Thus every point can be measured four times to improve precision.The corresponding parameters of the four cameras and the parameters of relative position of any two cameras need to be obtained before the measurement.
The cameras can be calibrated with a tradition pinhole model (Tsai, 1987), and then 3D space coordinates of any point can be calculated with its coordinates in the four images.

Multi-view images processing
The multi-view images processing is divided into extraction of feature points and matching of feature points.For convenience, let UL, LL, UR, LR image represent the upper-left, lower-left, upper-right, lower-right image respectively.

Extraction of feature points:
With high detecting speed and high position accuracy, Harris corners (Harris and Stephens, 1988) are chosen as feature points for matching.We adopt corner extraction of image partition to ensure corner points' uniform distribution.In order to reduce the search range, the Harris corners detected are stored in a sub-regional way.The feature points in the four images can be extracted in this way.

Matching of feature points:
Template matching methods are used to search the homologous image points in most stereo matching algorithms.Traditional template matching methods mainly include sum of squared differences (SSD), sum of absolute differences (SAD), normalized cross correlation (NCC) and zero mean normalized cross correlation (ZNCC) (Lazaros et al., 2008).These methods weigh the degree of similarity between two points by calculating the difference between the pixels inside the rectangle window around one point and the pixels inside the rectangle window around the other point.ZNCC is chosen for matching due to its stronger anti-noise ability.Let respectively and ZNCC can be given by the following expression.represent the epipolar lines which can be obtained from the known parameters of the four cameras (Xu et al., 2012).The matching process is described as the following steps:   Figure 4 shows the whole processing flow of the abovementioned matching method.

Start
The candidate points exist?

Yes No
The maximum ZNCC>0.7?

Yes No
The maximum ZNCC>0.7?
The matching error<20 pixels?After matching, sub-pixel interpolation operation can be performed to improve measurement precision.Bicubic interpolation is chosen to gain sub-pixel position of the homologous points because of its smooth interpolation effect.As the four cameras of the PFPMS form four binocular systems with a long baseline (UL-UR, UL-LR, LL-UR, LL-LR), for every matched point, take the average of four space coordinates respectively calculated from it and its homologous points as the final coordinates of its corresponding space point.Figure 5 shows the 3D point cloud.

Triangulation and texture mapping
Generally, the surface of the object can be expressed with a triangulated irregular net.Delaunay Triangulation (Tsai, 1993) is performed to process the point cloud obtained from stereo matching.In order to avoid appearance of some long and narrow triangles with long sides during triangulation, the length of every triangle's sides should be limited.Figure 6 shows Delaunay triangulation of the point cloud.
In order to reconstruct a model with texture, every point's colour information extracted from one of the captured images can be used for texture mapping.The UL image is selected as the texture image.To every triangle, each vertex's texture coordinates can be obtained from the image coordinates of its matched point in the texture image, and the texture coordinates of internal points can be calculated by linear interpolation of the vertexes' texture coordinates.The texture image is mapped automatically to a model in this way.Finally, the 3d model of the scene is generated, as shown in Figure 7.

EXPERIMENT RESULTS
The 3D model of the scene of a wall has been obtained with the matching method above and a good result is given.In order to test the stability and adaptability of the method, the 3D models of Scene A and Scene B are reconstructed respectively.Figure 8 and Figure 9

CONCLUTION
This paper proposes an automatic 3D reconstruction method based on multi-view stereo vision.3D models of several scenes of No.172 cave in the Mogao Grottoes have been reconstructed using a portable four-camera photographic measurement system.The cameras of the measurement form two binocular systems with a short baseline and four binocular systems with a long baseline.The binocular system with a short baseline is used for rapidly matching with the small difference between the two images.The binocular systems with a long baseline are used for multiple measurements.Compared with a traditional binocular system, the PFPMS have the advantage of reducing the possibility of mismatching and improving measurement accuracy.The experiment results show the effective of this matching method.
The limitation of the method is that the point cloud is not enough dense in a region with poor texture.Besides, only several models of local scenes can be reconstructed but are not complete.The future work will be focused on obtaining a dense point cloud by introducing structured light and stitching of the models reconstructed from different perspectives.
Figure 2. Extraction of Harris Corners half of the size of template window. (N is set to 10 pixels in actually matching) 1 1 , y x = coordinates of the matched point in image 1.

Figure 3
Figure3shows the main matching scheme based on epipolar constraint.LetLL UL l  , UR UL l  , LR UL l  , UR LL l  , LR LL l  , LR UR l  and find some possible points which have a ZNCC value above 0.9.Rank these points by ZNCC value from high to low and the top five points are chosen as the candidate matched points.b) Let LL P represent the first candidate point.
the maximum value is less than 0.7, remove LL P from the candidate points and return to step (b).d) Find the matched point LR P in the rectangle region (40 pixel × 40 pixel) around the intersection of return to step (b).g) Calculate the matching error defined as the sum of the distance between each matched point and the intersection of the epipolar lines of two other matched points with long baseline relative to it.If the matching error is less than 20 pixels, regard as the homologous points and return step (a) for the matching of the next point.Otherwise, remove LL P from the candidate points and return to step (b).

Figure 4 .
Figure 4.The processing flow of the matching method