IMAGE-BASED NAVIGATION OF FOREST HARVESTERS

The focus of this paper the use of multi-image matching techniques in forestry applications. Background of the study is the problem of navigating heavy harvesters through skidder trails on their way to harvesting individual trees. Maneuvering these heavy vehicles over unprotected forest ground leads to irreversible soil compression and degradation effects. Therefore, harvester operators strive to navigate in a way that exactly the same (already compressed) path is used when they enter a skidder trail for a second time. For this task, vehicle navigation on a decimeter accuracy level is required. Data of existing techniques, such as GPS, IMU and/or odometry are error prone, because of difficulties like fluctuating signal strength of satellites caused by dense plant canopy, drift of IMU without update, and slippery, rough ground for wheel decoding. A camera, as a passive sensor, may avoid these problems, as it is largely independent to those outer influences.


INTRODUCTION
Modern forest industries need heavy and fast harvest machines to be economical.Although, vehicles are equipped with modern navigation systems, forest ground may be compressed and damaged irreversibly.GPS, IMU and odometry are error-prone, caused by dense plant canopy, IMU drift and slippery ground.Therefore, an alternative or an additional system should support existing devices.An image based approach might help to avoid these problems and increase the accuracy of localization.Captured images can be oriented by photogrammetric algorithms and deliver information about motion of used camera.In combination with GPS, the estimated motion track of harvesters could be transformed into a global coordinate system.Then, all subsequent forest machines may follow this more accurately tracked path to avoid irreversible soil compression beyond the predefined skidder trails.The aim of this survey is to evaluate an image-based navigation for forest harvesters.In a first step, images have been captured manually along a nearly straight path within a forest to simulate a harvester path without vibration and much motion changes.Fast and reliable feature detectors like SIFT and SURF have been used to find tie points in overlapping images areas.Further, tie points have been used to estimate the relative orientation between all images and to calculate their outer orientation.As a result, the driven track can be visualized as a 3D trajectory, when the images taken along the path are processed in the corresponding order.

RELATED WORK
State-of-the-art forest navigation is based only on GPS and delivers an accuracy about 5 to 10m (Hamberger, 2001), caused by shielding of plant canopy.Vision based approaches might aid error-prone systems, like GPS and IMU, to navigate in spite of signal loss or drift effects.Requirements for that kind of navigation are successively captured images with overlapping regions.Image position and orientation can be reconstructed by relative orientation based on feature points.Feature detectors such as SIFT (Scale Invariant Feature Transform) (Lowe, 2004) or SURF (Speeded Up Robust Features) (Bay et al., 2006), are scale and rotation invariant and adequate for tie point detection.Structure from Motion (SfM) software packages, like Bundler (Snavely et al., 2007), take unordered sets of images and produce 3D reconstructions of all camera positions and scene geometry.SfM tools mostly use a sparse bundle adjustment to compute the results and can therefore be used only in retrospect.Another approach reported in (Davison, 2003) offers a real time application for vision based simultaneous localization and mapping (VSLAM).Hence, it is even possible to reconstruct a path with a monocular device.

SENSOR AND DATA
For first experiments, manual images have been captured with the NIKON D700 and a NIKON NIKKOR 20mm wide-angle lens, as shown in figure 1 and summarized

METHODOLOGY
In this experiments two different approaches of pose estimation and 3D data acquisition in woodland with relative orientation have been evaluated.Basically, known structure form motion algorithms have been utilized, but here they are employed in the context of forest environments, which previously did not receive adequate attention.A REAL-TIME-PROCESS estimates the orientation of two subsequent images with SURF (Bay et al., 2006) or SIFT (Lowe, 2004) feature points and calculates the parameters of motion (X,Y ,Z,ω,φ,κ) between this sequence.As a second approach, POST-PROCESSING calculates the position of images with SIFT features in a combined orientation and bundle adjustment (Bundler (Snavely et al., 2007) and (Lourakis and Argyros, 2009)).This bundle adjustment improves position and orientation of each image in sequence by more redundancy based on by more observations and distributing residual errors in equal parts on each point of view.
Figure 2: Example image of captured stack with 36 frames overall

Post Processing Approach
A post-processing algorithm can be employ to compute the camera positions and orientations of images set after capturing over-all.Further, some additional algorithms use those orientated images and their positions to estimate a dense point cloud of the surrounding environment.The post-processing algorithm uses the Bundler software (Snavely et al., 2007) for image orientation.Therefore, a stack of images with overlapping areas is required.Global unique feature points inside of covered regions are necessary at first.These are located by a SIFT feature detector to get a robust feature descriptor for this position.The SIFT feature descriptor specifies a global unique feature and can be compared with feature points of different images of the stack.An approximate nearest neighbors (ANN) matching method among different features delivers in combination with computed fundamental matrix and RANSAC (Random Sample consensus) (Fischler and Bolles, 1981) a robust estimator for parameters of relative orientations.Furthermore, the bundle adjustment connects all detected homologue feature points and approximated focal length to reduce the error of observations using a least squares method based on Gauss-Markov-Model.In addition to camera position and 3D points, the software computes focal length and distortion, thus it is not necessary to calibrate the camera before using.
Results of Bundler can be imported into PMVS2 (Patch-based Multi-view Stereo) software (Furukawa and Ponce, 2010) to create a dense point cloud of environment."PMVS is a multi-view stereo software that takes a set of images and camera parameters, then reconstructs 3D structure of an object or a scene visible in the images."(Furukawa, 2010)

Real Time Approach
A post processed localization and mapping assists in the estimation of the driven path and computation of a 3D point cloud of the environment after image acquisition.Nevertheless, in some cases a real time analysis is required, as for instance a simultaneous localization and driving of forest vehicles.Such a method could help restraining the forest machine to a given track and consequently avoiding further soil compression.In comparison to the approach of section 4.1, this real time algorithm computes the orientation of images sequentially.That means, the pose estimation is computed gradually, from image to image, and the actual position is accumulated by previous image positions.First of all feature points are detected and feature descriptors are computed.Following descriptors are matched to obtain homologous point pairs in an overlapping image pair.In order to improve the speed of this algorithm, SURF features are detected instead of SIFT.RANSAC based on a computed fundamental matrix, helps to find the best solution for orientation.Because of the over determined equation system, an additional adjustment of the relative orientation is performed and uses all inliers determined with RANSAC and increases the accuracy of parameters by minimizing the square of residuals.The number of tie points between two images is essential to compute an accurate relative orientation.Tests with images, that contain trees, have shown that the number of corresponding points strongly depends on the resolution of the image (Fig. 3).The reason for this behavior is founded in the computed feature descriptor size and the fast background color changing of features.So it may helpful to resample the image to a smaller size (table 2).In this approach, image coordinates are direct observations, so a calibration of the camera is necessary and feature point positions have to be corrected for lens distortion and principle point shift.

RESULTS AND DISCUSSION
The results of post (Sec.Bundler, utilized in the post processing method, uses a sparse bundle adjustment.In compare to the sequentially working real time approach, the estimated camera position are more accurate, but the analysis can only start after the entire whole image recording is completed.The real time algorithm directly estimates the position and orientation after capturing a single image, but the error is increasing from image to image because of variance propagation.

Post Processing Approach
Unfortunately, Bundler provides no accuracy results of obtained calculations.Therefore, an external bundle adjustment, which imports the results of Bundler (3D points, 2D points, image orientation) has been used to compute the results with accuracy details again(see table 3).All object points from Bundler are employed as control points in this bundle adjustment and previously obtained feature points are observations; therefore it is more of a multi space resection adjustment.parameter mean standard deviation for 36 frames

Real Time Approach
As presented in table 4, the mean accuracy is comparable with the post processing method (section 5.1).However, illustrated results are just the accuracy values for one image pair.Because of variance propagation, the uncertainty of all following images increases.It is noticeable, that rotation values (ω,φ,κ) are worst in comparison to translation.The basic cause for this effect is a clustered tie point allocation: In some images, homologous points are very close together and concentrated in only one region of the image.As a result the rotation parameters can not be computed with the expected accuracy.
To avoid the problem of variance propagation, successive image have to be connected to each other.Detected and stored feature points can be used to obtain the relative orientation between two or more images.In that way, preceding images could be linked with current images to stabilized pose estimation.(Davison, 2003) and (Davison et al., 2007).It is an advanced model of VSLAM with an Extended Kalman Filter (EKF).At the moment, our work does not support a prediction and SLAM algorithm.Nevertheless first results and existing works demonstrate the potential of this method, which we strive to develop further.

CONCLUSIONS AND FUTURE WORK
First results of this pilot study show the feasibility of theoretical background and it is worthwhile to pursue this approach further.A vision based method can estimates a position of a forest harvester with an adequate accuracy.Especially the post-processing approach delivers stable and precise results.The Pose estimation results below one cm in experiments using post and real time processing exceed the expectations and form the basis for further research.Furthermore, the achieved results meet accuracy requirements of about 15cm, i.e. half of a tire width, for forest navigation to avoid soil compression.Influences on accuracy, such as strong wind and fast movements, have not been analyzed yet.The calculated point cloud of Bundler and PMVS2 can be used to create a virtual forest.Current approaches for automatic tree detection ((Heurich and Weinacker, 2004), (Schilling et al., 2011)) for laser scanner data can be applied on this data.In this context, another possible navigational aid could be a virtual forest model in which cylinders have been fitted to appropriate point cloud subregions.Assuming all trees of this virtual forest have at least 2D coordinates, a camera on a harvester and a corresponding software have to solve the assignment of trees in object and image space and could compute their position by spatial resection.

Figure 3 :
Figure 3: Different number of feature points, due to different resolution image resolution number of corresponding feature points 100% 61 75% 167 50% 197 25% 97 Table 2: Number of corresponding feature points with different image resolutions.Respective image examples Fig. 3 camera parameters.Outliers are detected and eliminated and all residuals are minimized by the method of least squares.As a result, generated errors are distributed in equal parts to each camera position and existing point of views exhibit only small errors.Bundler, utilized in the post processing method, uses a sparse bundle adjustment.In compare to the sequentially working real time approach, the estimated camera position are more accurate, but the analysis can only start after the entire whole image recording is completed.The real time algorithm directly estimates the position and orientation after capturing a single image, but the error is increasing from image to image because of variance propagation.
Figure 4: results of Bundler and PMVS2

Table 1 :
in table 1).Shutter time and aperture have been chosen so that motion blur is avoided.The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B3, 2012 XXII ISPRS Congress, 25 August -01 September 2012, Melbourne, Australia pictures have been taken in full resolution of 4256 x 2832 pixel, but had to scaled down to a smaller size (see section 4) to comply with requirements of feature detection algorithms.Data of camera and lens

Table 3 :
Mean accuracy of camera orientation from BundlerThis computation achieved results based on a excellent image acquisition without any interruptions or difficulties.That means there was no interruption of image sequence, no obstacles, con-stant velocity and without strong wind.Thus, this mentioned results are very optimistic but also possible.Bundler automatically computes a high accurate pose estimation based on a small number of parameters (focal length and image size).It detects natural feature points with an accuracy below one pixel in image space and therefore it can employed in forest applications as well.The obtained results are encouraging and demonstrate that the post processing method is indeed feasible.PMVS provides a patchbased algorithm that uses the results from Bundler to create a dense point cloud of environment.Based on a extended multiview patch correlation, PMVS increases the number of points to four times in compare to Bundler with given image sequence of 36 frames.