HYBRID-BASED DENSE STEREO MATCHING

: Stereo matching generating accurate and dense disparity maps is an indispensable technique for 3D exploitation of imagery in the fields of Computer vision and Photogrammetry. Although numerous solutions and advances have been proposed in the literature, occlusions, disparity discontinuities, sparse texture, image distortion, and illumination changes still lead to problematic issues and await better treatment. In this paper, a hybrid-based method based on semi-global matching is presented to tackle the challenges on dense stereo matching. To ease the sensitiveness of SGM cost aggregation towards penalty parameters, a formal way to provide proper penalty estimates is proposed. To this end, the study manipulates a shape-adaptive cross-based matching with an edge constraint to generate an initial disparity map for penalty estimation. Image edges, indicating the potential locations of occlusions as well as disparity discontinuities, are approved by the edge drawing algorithm to ensure the local support regions not to cover significant disparity changes. Besides, an additional penalty parameter 𝑃 𝑒 is imposed onto the energy function of SGM cost aggregation to specifically handle edge pixels. Furthermore, the final disparities of edge pixels are found by weighting both values derived from the SGM cost aggregation and the U-SURF matching, providing more reliable estimates at disparity discontinuity areas. Evaluations on Middlebury stereo benchmarks demonstrate satisfactory performance and reveal the potency of the hybrid-based dense stereo matching method.


INTRODUCTION
Stereo matching is the problem of recovering corresponding points from different image views, and is one of the indispensable ingredients for 3D exploitation of imagery in the fields of Photogrammetry and Computer vision.Many different stereo vision algorithms have been proposed.An overview of state-ofthe-art methods can be found in the Middlebury Stereo benchmark (Scharstein and Szeliski, 2002).Current stereo matching techniques can be categorized into pixel-wise, local and global algorithms.Pixel-wise matching methods simply take an optimal disparity for each image component.(Birchfield and Tomasi, 1999).Local algorithms aggregate the support from the neighboring pixels in a constrained region to determine the disparity without considering neighboring connections.These methods may suffer from a lack of smoothness and fail when support regions of pixels contain repetitive patterns, disparity discontinuities, and occlusions.Studies, such as adaptive windows (Kanade and Okutomi, 1994;Scharstein and Szeliski, 2002), multiple windows (Hirschmuller et al., 2002), and image segmentation (Gerrits and Bekaert, 2006), tended to adapt the shape and size of the support region or adjust the support weights of the region pixels near depth discontinuities to improve disparity estimates.The adaptive-window method finds an optimal window based on the local variation of intensity and disparity, while the multiple-window method calculates the correlation with nine pre-defined windows and selects the disparity with the smallest matching cost.These methods, nevertheless, are not suitable for arbitrarily shaped depth discontinuities due to utilizing a rectangular window.Besides, studies, such as Xu et al. (2002) who resolved adaptive support weights by radial computations; Yoon and Kweon (2006) who assigned a support weight to the pixel in a support region based on color similarity and geometric proximity; Zhang et al. (2009) who used color similarity and connectivity constraints to * Corresponding author construct an upright cross local support skeleton for each anchor pixel, and dynamically built a shape-adaptive full support region, can also be found in the literature.Typically, local methods require less computation, and consequently less accuracy can be attained, as compared to employing global methods.Also, deciding an appropriate support region of each pixel remains an inevitable challenge for local stereo matching algorithms.On the other hand, global algorithms seek optimal pixel matches by minimizing a global energy function which consists of a data term and a smooth term to penalize inconsistent solutions and to enforce the piecewise smoothing assumption, respectively.The data term represents the coherence of a region often measured from image color, luminance, and texture.The smoothness term assigns a large penalty to those neighboring pixels conveying different disparity values, then the similar pixels can be merged.Various optimization techniques, such as graph cuts (Boykov et al., 2001;Wang et. al, 2013), belief propagation (BP) (Sun et al., 2003;Felzenszwalb and Huttenlocher, 2006;Klaus et al., 2006) and dynamic programming (DP) (Ohta and Kanade, 1985;Gong and Yang, 2003;Torr and Criminisi, 2004), are often used to determine the local minimum of the energy function.Compared to local methods, global methods render better quality of the estimated depth, but involve high computational complexity.Thus, it makes them inapplicable for near or real-time applications.In addition, Hirschmuller (2008) proposed a popular semi-global matching (SGM) to achieve high precision depth estimation, which joints the advantages of both global and local algorithms and is capable of real-time demands.SGM approximates a global optimization by combining several local optimization steps.The energy function of SGM comprises a data term, a smoothness term for slight changes in disparity, and a larger smoothness term for depth discontinuities with significant disparity changes.Klette et al. (2011) gave tests on different stereo algorithms, including variants of DP, BP, and SGM, among which they suggested that SGM can potentially deal with scenes of high-depth complexities and is the most promising.Numerous variants of SGM, such as PlaneFitSGM (Humenberger et al., 2010), wSGM (Spangenberg et al., 2013), SGMDDW (Michael et al., 2013), rSGM (Spangenberg et al., 2014), iSGM (Hermann and Klette, 2013), eSGM (Hirschmuller, 2012), and real-time implementation (Banz et al., 2010), have been proposed with respect to advancing the accuracy and computational complexity.Banz et al. (2012) evaluated four different penalty functions under two matching cost calculation manners, namely the census and rank transformation (Zabih and Woodfill, 1994), and concluded that the linear and inversely proportional penalty functions significantly outperform the constant and variance-based ones.Besides, for all penalty functions, using the census transform instead of the rank transform exhibits better disparity maps with less edge blurring because the census transform retains spatial information.Furthermore, image features are frequently used to assist in stereo matching as well (Sadeghi et al., 2008).Yang and Wang (2015) purported to improve an adaptive support weight approach by incorporating Canny edges.Poddar et al., (2015) employed image segmentation and feature point segment matching techniques for dense disparity estimation.Xiao et al. (2013) employed ground control points into the energy function of SGM (GCP-SGM) as soft constraints for aerial image applications.Zhu et al., (2011) expended the stereo matching technique to aerial and satellite image sequences.Konrad and Lan (2000) combined the block-based disparity estimation with feature point matching.
In this paper, a hybrid-based approach is proposed to perform disparity estimation from rectified/epipolar stereo pairs.As suggested in Tian et al. (2013), matching cost is determined through the census transformation (CT) with the absolute differences (AD).A gradual strategy, involving edge constraints and penalty estimation, is presented to alleviate the sensitiveness towards the penalty parameters in cost aggregation step, and thus more accurate disparity estimation can be achieved.Image edges, which typically indicate depth discontinuities, are detected by the edge drawing (ED) algorithm (Akinlar and Topal, 2011) and treated as the edge constraint.Moreover, to shrink the potential disparity errors and to enhance the quality of depth estimation at image edges (object boundaries), this study manipulates the U-SURF descriptor (Bay et al., 2008) to simultaneously recover the edge disparities, and weights both values derived from the stereo and U-SURF matching to find the final disparities at edge pixels.Notably, in this paper no post-interpolation process is applied so as to properly evaluate the estimated results.
The rest of the paper is organized as follows: Section 2 briefly reviews the relevant techniques and introduces the methodology to address this work.Afterwards, experimental outcomes are exhibited in Section 3. Finally, conclusions are drawn in Section 4.

HYBRID-BASED STEREO MATCHING
The study conceptually integrates algorithms of shape-adaptive cross-based local stereo matching and semi-global matching involved with a pixel-wise edge constraint to improve the quality of disparity estimation.The proposed method starts by categorizing image content into edge and interior pixels.For disparity initialization, the edge pixels are introduced into a U-SURF matching process while the interior pixels are submitted to a gradual stereo matching process.In addition, a reasonable penalty estimation method as well as an image edge constraint are imposed on the stereo matching procedure to enhance the matching performance, and the edge disparity results derived from the two matching process are weighted for better estimation in depth discontinuity areas.Figure 1 shows the block diagram of the proposed working scheme.

U-SURF matching
Edge pixels in each image are approved by ED which takes no parameter tuning and is capable of achieving real-time processing.The study constructs U-SURF descriptors for each edge pixel and performs 1-D matching along its epipolar line within the disparity search range as illustrated in Figure 2. The U-SURF descriptor provides a unique and robust description for the intensity distribution of surrounding pixels.A square region is extracted centered on the edge pixel and split up into smaller 4x4 square sub-regions from which the Haar responses weighted with a Gaussian are extracted.The best candidate match is found through the best-bin-first search (Beis and Lowe, 1997) identifying the nearest neighbor which is defined as the one with the minimum Euclidean distance from the descriptor vector.The probability of a correct match is determined by considering the ratio of distance from the closest neighbor to the distance of the second closest.Subsequently, a quasi-RANSAC approach (Stamatopoulos et al., 2012) for outlier removal is applied.Finally, the preliminary disparities of edge pixels are resolved.Notably, in order to prevent outliers, strict thresholds are applied to the matching ratio test and the quasi-RANSAC process.These thresholds can be further fine-tuned considering the percentage of total matched edge pixels: where    = percentage of matched edge pixels  ℎ = number of matched edge pixels   = number of detected edge pixels In this paper, we determine the threshold for ratio test as 0.6 and one-pixel allowance for outlier removal, an average matching rate of 64.3% can be obtained, as portrayed in Figures 3(a The disparities of those unmatched edge pixels are then explored with interior pixels as follows.

Initial cost aggregation:
SGM cost aggregation aims to minimize several 1-D energy functions for a global 2-D energy minimization problem.Typically, a 1-D energy function () comprising a data term and a smooth term can be formulated as (Hirschmuller and Scharstein, 2009): The data term calculates the sum of a pixel-wise matching cost (,   ) for all pixels  at their disparities   ; the smooth term involves two penalty parameters (  1 ,  2 ). 1 penalizes neighboring pixels   of  if their disparity difference is equal to 1;  2 is imposed on disparity changes larger than 1.That is,  1 permits an adaptation to slanted or curved surfaces while  2 preserves discontinuities as they mostly coincide with intensity variations.Yet, the value of  2 should be larger than  1 .
Basically, the quality of the disparity map significantly relies on the two empirically determined parameters for standard SGM (Hermann et al., 2009).If the value of  2 is too small to smoothen a disparity map, most noise would remain.Conversely, the disparity map would be over smoothened with an extremely large  2 .Hirschmuller (2008) first introduced an adaptive  2 function to penalize abrupt disparity changes according to the image content.Banz et al. ( 2012) evaluated three further penalty functions and concluded that the negatively and inversely proportional to the absolute luminous intensity gradient of the currently processed pixels along the path significantly outperform the variance-based approach.However, the parameters involved in each function still need to be manually tuned with a carefully determined (try and error) step in a way they must be big enough to ensure sufficiently different configuration and small enough not to miss local minima at the same time.
In current work, a proper penalty estimation and the edge constraint are introduced into the cost aggregation step to facilitate the disparity computation.The aggregation process comprising initial disparity generation, penalty estimation, and disparity determination is given as follows.

Initial disparity generation:
The study leverages shape-adaptive cross-based matching approach (Zhang et al., 2009) with the edge constraint to generate an initial disparity map.
The edge constraint enforces the adaptive support regions not to cross significant disparity changes, which is appropriate for pixels near arbitrarily shaped depth discontinuities.The adaptive cross-based algorithm is performed on the interior pixels to search their support region in four directions (upper, lower, left and right).The region stretches from the central pixel until two consecutive pixels are not consistent in color, making the maximum arm length, or touching the edge pixels.Edge pixels are excluded from this measure.The arm length  * is defined as (Zhang et al., 2009):  (Zhang et al., 2009), which decomposes the matching cost aggregation into two orthogonal 1-D integration steps, is used.The disparity value has a maximum population within the support region is deemed as the result of the interior pixel.Accordingly, the disparities of edge and interior pixels are jointed to generate the initial disparity map.
(a) (b) Figure 4. Demonstrations of the shape-adaptive cross-based matching cost aggregation with the edge constraint.

Penalty estimation:
Considering the diversity of image content, it is not a trivial work to automatically arrange the penalty parameters in the cost aggregation step.This study presents a convenient way to estimate proper penalty parameters for the SGM cost aggregation.A consistency check is first done to rectify the initial disparity map, eliminating occlusions and false matches as shown in Figure 5(c).Then, a matching cost ratio (  ) is determined for the inspected pixels.
where   = the ratio of matching cost   = the smallest matching cost   = secondary small matching cost   = number of inspected pixels A larger   of a pixel represents a higher confidence in its disparity estimate.Thus, a map of matching cost ratio can be generated as Figure 5(d).Consequently, confident pixels are selected if their ratios are larger than a threshold   .Therefore,  2 is determined from the confident pixels by taking the average of the differences between the smallest matching cost and the secondary small one. 1 is half of  2 .where   = number of confident pixels Equation 8gives proper penalty estimates for automated parameter setting in the energy function.Still, the values can be fine-tuned if the matching result suggests this is necessary.The potency of the penalty estimation is to be demonstrated in the evaluation section.

Disparity determination:
Standard SGM aggregates () along 1-D paths from eight directions toward each pixel of interest using dynamic programming.As mentioned in section 2.2.2, an over large  2 would extremely smoothen the disparity map.Thus, a relative smaller penalty parameter should be used at image edge pixels to preserve discontinuities coincided with intensity variations.As a result, the study imposes the edge constraint and applies an additional edge penalty parameter   to the energy function at edge pixels as: where   = penalty parameter for edge pixels   = ( 1 +  2 )/2  = edge pixels The value of the edge penalty parameter   is defined as the average of  1 and  2 .The disparity is then retrieved by a winnertakes-all strategy.Furthermore, we weigh the resultant disparities of edge pixels (  ) with the preliminary edge disparities (  ) acquired from U-SURF matching to obtain the final disparities of edges, reading in Equation 10: where  = weight value,  ∈ [0,1]   = weighted edge disparities   = edge disparities derived from stereo matching   = edge disparities derived from U-SURF matching Finally, the disparity map of the whole image is determined.

EVALUATION
The experiments aim to evaluate the effectiveness of the proposed hybrid-based matching method in two phases.Test images (Tsukuba, Venus, Teddy, and Cones) of the Middlebury stereo benchmark (Scharstein and Szeliski, 2008) were utilized.Firstly, we compared the effectiveness of the penalty estimation with manual configurations of ( 1 ,  2 ) ∈ {0.1,2; 0.01,5; 1,10} for disparity map generation.Figure 6 shows the proportion of the disparity error of the four test image.This reveals that our method obtains the best accuracy in both disparity discontinuity and non-occlusion areas.Manual setting of  2 performs well if carefully adjusted to the image, but quality degrades rapidly as these values are changed.In light of the resultant disparity maps (Tsukuba image, for instance) shown in Table 1, the proposed penalty estimation outperforms all the results inferred from the manual parameter setting, particularly in the region pointed out by the red arrow.( 1 ,  2 ) = (1,10) Proposed method Table 1.Results of the disparity map.
The study imposes the constrained shape-adaptive cross-based local method on the SGM to alleviate the sensitiveness towards the penalty parameters.Thus, we respectively jointed three additional local methods, namely an adaptive support-weight approach (ASW) (Yoon and Kweon, 2006), a fixed windowbased approach (FW), and a cross-based method (CB) (Zhang et al., 2009), with the SGM cost aggregation to evaluate the virtues of the proposed method.Table 2 shows the proportion of the disparity error of each set of the stereo matching methods.Table 3 demonstrates the resultant disparity maps of the Tsukuba image, and the edge disparities derived from the U-SURF matching.Proposed method U-SURF matching Table 3.The resultant disparity maps.
In light of the Table 2 and Table 3, SGM results in a less smooth disparity map because its cost aggregation step is susceptible to the values of the penalty parameters.Noise is effectively reduced while the local methods are incorporated into the SGM.This proves the strategy that brings forth an initial disparity map to facilitate the SGM cost aggregation is practically feasible.Moreover, among these approaches, the proposed method achieves the best accuracy for disparity estimation.Because of the impression of the edge constraint, the disparity errors of our method are relatively small in particular close to the discontinuity area.These results also suggest that the more accurate initial disparity is provided, the higher quality of the disparity estimation can be achieved.

CONCLUSIONS AND FUTURE WORK
In this paper, the effectiveness of the proposed hybrid-based stereo matching approach has been verified with the Middlebury stereo benchmark.The study presents the shape-adaptive crossbased matching approach with the edge constraint to generate an initial disparity map for the penalty estimation.As a consequence, the sensibility of the SGM cost aggregation towards the penalty parameters can be alleviated.In summary, the study imposes the edge constraint onto the energy function of SGM and integrates the disparities of edge pixels, derived from U-SURF matching to improve the accuracy in disparity discontinuity regions.Moreover, the optimization in reducing computational complexity and further assessment will be studied in future work.

Figure 1 .
Figure 1.The block diagram of the proposed method.

Figure 2 .
Figure 2. Illustration of the edge pixel matching.
) and 3(b).Figure3(c) demonstrates the matching results of the first 170 corresponding edge pixels along their epipolar lines.On the other hand, missed matched edge pixels largely result from the occlusions and the dissimilar edge detection results of each image, as pointed out in Figures3(d) and 3(e).The yellow dots indicate the detected edges and the red dots present the matched edge pixels in Figure 3. (a) Left image (b) Right image (c) Corresponding edge pixels (the first 170 matches) (d) Enlarged look of left image (e) Enlarged look of right image Figure 3. Demonstration of the edge pixel matching.
cost: For accurate stereo matching, it is important to decide an appropriate metric function of matching cost for each pixel.In light of the evaluations inHirschmuller and Scharstein (2009), the census transformation(Zabih and Woodfill 1994) proved to have the most balanced performance and better adaptability to intensity distortion caused by illuminating variation.The matching cost is decided based on the relationship of regions between two pixels instead of relying directly on the intensity values.Nevertheless, the census transformation is inadequate to handle image areas with little texture as well as repeated patterns and has a lower tolerance against image noise.Tian et al. (2013) suggested the absolute difference function with better matching effects in sparse textured areas and thus can be used to compensate for the deficiency of the census transformation.Therefore, the combination of these two matching energy functions is preferred in this paper.The integration of CT and AD matching energy functions can be expressed as(Tian et al.,  2013): (, ) = (  (, ),   ) + (  (, ),   ) (2) where  = the presupposed disparity value   = the matching cost of CT   = the matching cost of AD And the normalized function (, ) is formulized as:   (()) =⊗ ′∈() (, ), (, ) { 1  () < () 0 ℎ   (, ) =   (  (  (, )),   (  ( − , ))   (, ) = min{|  (, ) −   ( − , )|, } (, ) = 1 − exp (−   ⁄ ) (3) The   (, ) and   (, ) indicate gray values of corresponding pixels in the left and right images, respectively; () is the luminance value of pixel of  ; () is the  -centered transforming window;  indicates neighboring pixels.  ,   , and  are prior parameters.Normalized function avoids the matching cost overly leaning to one certain type. controls the weights of the two matching cost, which assists in combining the advantages of CT and AD for a better adaptability to intensity distortion and noise.Details of Equations 2 and 3 can be referred to Tian et al. (2013).
5) where   = (  − ,   ) , (,   ) ∉ edge pixels;  is the preset maximum arm length.(,   ) is an indicator function evaluating the color similarity between consecutive pixels based on all color bands.(,   ) = { 1,  ∈[,,] (|(  ( 1 ) − (  ( 2 )|) ≤   0, ℎ (6) where   is the intensity of the color band, and   controls the confidence level of color similarity.Under the constraint of edges, the support regions of each interior pixel are reconstructed through whole image.The pixels (  + ,   − , ℎ  + , ℎ  − ) in Figure 4(a) define the horizontal segment H(p) and vertical segment V(p).The expansion of the combined local cross H(p) ∪ V(p) determines the integrated aggregation region U(p).Figure 4(b) demonstrates a result of support regions with the edge constraint, where the edges confining the support regions are colored in yellow.To enhance the consistency of disparity values within the support region, an efficient OII voting technique Errors pixels in black (d) Matching cost ratio Figure5.The generation of a matching cost ratio map.

Table 2 .
The proportion of the disparity error (> 1 pixel).