ACCURACY EVALUATION OF STEREO VISION AIDED INERTIAL NAVIGATION FOR INDOOR ENVIRONMENTS

: Accurate knowledge of position and orientation is a prerequisite for many applications regarding unmanned navigation, mapping, or environmental modelling. GPS-aided inertial navigation is the preferred solution for outdoor applications. Nevertheless a similar solution for navigation tasks in difﬁcult environments with erroneous or no GPS-data is needed. Therefore a stereo vision aided inertial navigation system is presented which is capable of providing real-time local navigation for indoor applications. A method is described to reconstruct the ego motion of a stereo camera system aided by inertial data. This, in turn, is used to constrain the inertial sensor drift. The optical information is derived from natural landmarks, extracted and tracked over consequent stereo image pairs. Using inertial data for feature tracking effectively reduces computational costs and at the same time increases the reliability due to constrained search areas. Mismatched features, e.g. at repetitive structures typical for indoor environments are avoided. An Integrated Positioning System (IPS) was deployed and tested on an indoor navigation task. IPS was evaluated for accuracy, robustness, and repeatability in a common ofﬁce environment. In combination with a dense disparity map, derived from the navigation cameras, a high density point cloud is generated to show the capability of the navigation algorithm.


INTRODUCTION
Many applications for indoor environments as well as for outdoor environments require an accurate navigation solution.GPS aided inertial navigation is widely used to provide position and orientation for airborne and automotive tasks.Although this is working very well it has major weaknesses in difficult environments with erroneous or no GPS data, e.g.urban areas, forested areas or indoor environments as needed for robotic applications or indoor 3D reconstruction tasks.Due to errors inherent to inertial sensors, the pure integration of inertial data will lead to an unbound error grow, resulting in an erroneous navigation solution.Reasonable measurements of an additional sensor are needed to restrain this errors.Some proposed solutions require active measurements, e.g.radar, laser range finder, or local infrastructure which have to be established first (Zeimpekis et al., 2003).On the other hand vision can provide enough information from a passive measurement to serve as a reference.As no local infrastructure or external references are used it is suitable for non-cooperative indoor and outdoor environments.A stereo based approach was preferred to obtain 3D information from the environment which is used for ego motion determination.Both, inertial measurements and optical data are fused within a Kalman filter to provide an accurate navigation solution.Additional sensors can be included to achieve a higher precision, reliability and integrity.The Integrated Positioning System (IPS) includes a hardware concept to guarantee synchronized sensor data as well as a software design for real time data handling and data processing (Grießbach et al., 2012).IPS was evaluated for accuracy, robustness, and repeatability in a common office environment.In combination with a dense disparity map, derived from the navigation cameras, a high density point cloud is generated to show the capability of the navigation algorithm.

INTEGRATED POSITIONING SYSTEM (IPS)
A multi-sensor navigation system for the determination of position and attitude of mobile devices was developed.The navigation is based on a strapdown algorithm which integrates inertial measurements to achieve position and attitude.A method is described to reconstruct the ego motion of a stereo camera system aided by inertial data.This, in turn, is used to constrain the inertial sensor drift.Both, inertial measurements and optical data are fused within a Kalman filter to provide an accurate navigation solution.To produce real-time high level information from low level sensor data, a hardware concept to guarantee synchronized sensor data and a software framework to implement hierarchic data flows is needed.

Inertial Navigation
Inertial navigation systems (INS) consists of an inertial measurement unit (IMU) containing a triad of gyroscopes and accelerometers and a computing unit to integrate the measurements.The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-4/W4, 2013 Figure 1 shows the integration of the IMU signals by means of the well known strapdown mechanization (Titterton and J.L.Weston, 2004).The superscripts b and n are standing for body-frame and navigation-frame respectively.The navigation frame has an arbitrary origin with its axes aligned to the local tangent plane.Some difficulties arise when integrating measured angular velocity ωb and accelerations âb .Besides systematic scaling errors and errors from axis-misalignment which can be corrected beforehand, the bias terms ω b bias , a b bias are unknown from switch-on to switchon and also varying with temperature and time.This leads to a strong drift in attitude q n b , velocity v n , and position s n if left uncompensated.First, the quaternion update is calculated as follows: with the corrected angular rate ω b = ωb −ω b bias and q n b,t k representing the rotation from body-frame to navigation-frame at time t k .The operator • describes a quaternion multiplication.For the velocity update, the change in velocity is calculated from corrected acceleration measurement a b = âb − a b bias with and rotated to the navigation-frame with Now it can be corrected for gravitation g and added to the previous velocity.
For the interval t k to t k+1 the position integral is approximated with the trapezoidal rule.
A detailed derivation of the strapdown equations can be found in (Titterton and J.L.Weston, 2004).Since the used gyroscopes do not provide a sufficient accurate measurement to resolve the earth rotation it is not possible to find the north direction.Because there is also no global position reference available the earth rotation is neglected.For most indoor applications with low velocities and short times of pure inertial integration this is acceptable.

Stereo Vision
Visual information of a stereo system can be used for ego motion estimation or visual odometry (Olson et al., 2003, Nistér et al., 2004).Changes in position and attitude between two consecutive image pairs are estimated from homologous points.At a fixed frame rate this corresponds to measurements of velocity and angular velocity.To measure in images, a precise knowledge of geometric camera calibration is assumed (Grießbach et al., 2008).
In projective space P mapping of a homogeneous object point M ∈ P 3 to an image point m ∈ P 2 is defined with, where P is a 3×4-projection matrix (Hartley and Zisserman, 2000) consisting of the parameters of the interior-and exterior orientation of the camera.
with R, t describing the rotational matrix and translation of the exterior orientation and the camera matrix K containing the focal length f and the principal point u0, v0.
Furthermore lens distortion has to be considered.The very common radial distortion model (Brown, 1971) is used, considering pincushion or barrel distortion which is expressed as follows: with where x, y are normalized image coordinates calculated from the image coordinates m = (u, v, 1) T .
For image based pose estimation, features have to be detected and tracked over consecutive frames.Natural landmarks such as corners, isolated points or line endings are found by analysing the autocorrelation matrix from image intensity gradients as proposed by (Harris and Stephens, 1988).The extracted features have to be matched by using normalized cross correlation to find homologous image points.Figure 2 shows the two different matching steps.First, features in the left image at time t k are matched to the right image at time t k (intramatching).By using the epipolar constraint, the search area is reduced to a single line.3D object coordinates are now reconstructed by applying the interior orientation and the relative exterior orientation of the cameras.The next step will be to match left and right images at time t k to left and right images at time t k+1 (inter-matching).Having the strapdown navigation solution from inertial measurements, it is possible to predict the relative change in position and attitude ∆R |∆t .With the triangulated object points M the expected feature positions m at time t k+1 as well as their uncertainties are calculated.The image point uncertainties are used to determine the size of the search area.
With the found homologous image points, the relative change in pose between both image pairs can be estimated.This is done by minimizing the distance of image points m from time t k+1 to transformed and back-projected object points from time t k . min A RANSAC approach (Fischler and Bolles, 1981) is needed to filter for mismatched features to achieve a stable solution.

Data Handling
The low-level sensor fusion is realized by a FPGA board that has different sets of Add-ons attached.Depending on the application, major interface standards of low bandwidth sensors like SPI, CAN, RS232, digital in/outputs are supported.The data sampling process of external sensors usually takes place asynchronous to the capturing device.This makes it difficult for multi-sensor systems to align the different measurements to one common time line during the fusion process.The custom hardware used within the presented multi-sensor platform allows precise and deterministic synchronization by referencing all sensor data to one local time scale that is kept by the FPGA.A high precision clock generates timestamps to which all sensor communication is referenced.
The main objective for the data handling software is to set up a data processing chain from low level data to high level information (e.g. from sensor measurements to a navigation solution).Ideally, a particular task is encapsulated in a container, having only defined inputs and outputs.If input data is available, the task is immediately executed and sent to an output buffer.It is important to have buffer capabilities as a succeeding task may not be ready to receive new data at the moment when it is generated.Combining those containers allows for a flexible, efficient data handling, and data processing.

Data Fusion
To combine the different sensor outputs to a navigation solution a derivative free Kalman filter is used.The main advantages of that filter type are the third order accuracy with Gaussian inputs and the easy integration of mostly nonlinear equations.The systems ego motion is modeled with the assumption of constant angular rate and acceleration over a discrete period of time.The filter was formulated in total state space with a 16 components state vector s including the rotation in terms of a quaternion, velocity, raw angular velocity, angular velocity bias and acceleration bias, each vector with three degrees of freedom.
Figure 4 shows the combination of Kalman filter and optical system which provides incremental attitude-and position updates.
Receiving IMU-, camera-, or inclinometer-measurements the filter cycle is completed including a check for feasibility of the data.(Grießbach et al., 2010).The corresponding observation equations for the visual data are defined as follows: where vb cam,t k and ωb cam,t k denote the estimated velocity and angular velocity of the visual measurement.The variables η, ζ describe their zero-mean Gaussian white noise sequences retrieved from the estimation process.

SENSOR HEAD
This chapter gives an overview of the applied sensors and their main characteristics.The default configuration consists of a microelectromechanical IMU (MEMS) and two cameras forming a stereo system.Additionally an inclination sensor is added to get a more accurate system initialization.Other sensors capable of providing position, attitude, or their derivatives, e.g.GPS receiver, or barometer may be included for redundancy and more accuracy.In case of bad light conditions two LED's proving near infrared illumination are also included but not used for the indoor experiment.
Figure 5: IPS prototype MEMS-Sensors have the advantage to be small, lightweight and low-cost in comparison to IMU's with fibre-optical-gyros (FOG) or ring-laser-gyros (RLG).Although there is reasonable progress in the development of MEMS-gyros, FOG and RLG are still several orders in magnitude better with regard to bias stability and angle random walk (Schmidt, 2010).MEMS-gyros are also sensitive to accelerations introducing an additional error to the system.In fact, pure inertial navigation with MEMS-IMU's will give very pure performance and can be used only for short periods of time.Table 1 shows the specification of the used IMU.Two industrial panchromatic CCD-cameras (see table 2) form the stereo system.Together with the IMU and the inclination sensor the cameras are mounted on an optical bench to achieve a stable setting.A stereo base line of 0.2 meter gives a usable stereo range from 0.55 meter with an image overlap of 80% to about 10 meter

RESULTS
IPS was evaluated for accuracy, robustness, and repeatability in a common office environment.Lacking a ground truth, a closed loop scenario was chosen to evaluate the relative error in attitude and position.The run shown in figures 6, 7, and 8 has an absolute path length of 317 meters covering 4 floors.This includes the staircase, narrow corridors, and a wider entrance area with very different illumination and feature characteristics.The run was repeated 21 times under real-life conditions, with normal walking speed at 1.5 m/s, during working hours with people interfering the visual measurement.To avoid systematic errors the system was carried by varying people and restarted after every trial.Figure 9 shows the distribution of the 2D position errors for each pair of axes.The translational errors are more or less evenly spread over all axes.The averaged closed loop rotation and translation errors over all 21 trials are summarized in table 4. Due to the inclination sensor supporting the horizontal axes in times of hold-up, their errors  This error is partly caused by phases where no or little features could be seen, resulting in missing or low quality vision data.The consequence is an increased error grow from integrating the IMU measurements.Such difficult situations occur through changing lighting conditions, a low texturing at some areas, or for example at narrow stairways with all objects to close for the stereo system.Another error source can be found by the IMU scale-factors which are greater than 1% of the specific range (see table 1).

3D Point Cloud Application
To show the capability of the navigation algorithm a high density point cloud, derived from the stereo cameras is generated.In a parallel processing chain the stereo image data is used for dense stereo matching using Semi-Global Matching (SGM), a To fulfil the real time requirements, the algorithm was implemented on a Graphical Processing Unit (Ernst and Hirschmüller, 2008).
In combination with the navigation solution, the 3D point cloud of each image pair is transformed to the navigation-frame and merged together to get a high density raw point cloud.Figure 10 shows an example point cloud.This process is carried out off-line without further data processing.

Figure 2 :
Figure 2: Paths for intra-frame and inter-frame matching

Figure 3 :
Figure 3: Sensor data processing chain

Figure 9 :
Figure 9: 2D position error of 21 runs for all pairs of axes very powerful but time consuming method(Hirschmüller, 2005).To fulfil the real time requirements, the algorithm was implemented on a Graphical Processing Unit(Ernst and Hirschmüller, 2008).In combination with the navigation solution, the 3D point cloud of each image pair is transformed to the navigation-frame and merged together to get a high density raw point cloud.Figure10shows an example point cloud.This process is carried out off-line without further data processing.

Figure 10 :
Figure 10: View of generated raw 3D point cloud

Table 1 :
IMU ADIS-16405 specification with reasonable uncertainties from triangulation.The images are binned by factor two to allow real-time data processing with 15 frames per second on a standard notebook.This includes image logging, real-time image processing, and data fusion.

Table 2 :
Prosilica GC1380H specificationAdditionally a two axis inclination sensor is used to give an absolute reference measurement to the local tangent plane which defines two axis of the local navigation frame.

Table 4 :
Closed loop errors