OPTO-ACOUSTIC DATA FUSION FOR SUPPORTING THE GUIDANCE OF REMOTELY OPERATED UNDERWATER VEHICLES ( ROVs )

Remotely Operated underwater Vehicles (ROVs) play an important role in a number of operations conducted in shallow and deep water (e.g.: exploration, survey, intervention, etc.), in several application fields like marine science, offshore construction, and underwater archeology. ROVs are usually equipped with different imaging devices, both optical and acoustic. Optical sensors are able to generate better images in close range and clear water conditions, while acoustic systems are usually employed in long range acquisitions and do not suffer from the presence of turbidity, a well-known cause of coarser resolution and harder data extraction. In this work we describe the preliminary steps in the development of an opto-acoustic camera able to provide an on-line 3D reconstruction of the acquired scene. Taking full advantage of the benefits arising from the opto-acoustic data fusion techniques, the system was conceived as a support tool for ROV operators during the navigation in turbid waters, or in operations conducted by means of mechanical manipulators. The paper presents an overview of the device, an ad-hoc methodology for the extrinsic calibration of the system and a custom software developed to control the opto-acoustic camera and supply the operator with visual information.


INTRODUCTION
During the last few years, there has been growing interest for the development of efficient methodologies and systems in the underwater research area, in order to deal with the challenging problems of monitoring, survey and data gathering.In the field of cultural heritage, the scientific community and the international Cultural Heritage safeguarding bodies have established the need to promote, protect, and preserve, possibly in-situ, the submerged artefacts and sites.The CoMAS project, started in 2011, aims to develop new materials and technologies for supporting the restoration, conservation and documentation of underwater cultural heritage.The project goal is the definition of a conservation methodology that includes several stages: documentation, cleaning, restoration, maintenance and monitoring.One of the most challenging aspects of this project is the set-up of a special ROV (Remotely Operated Vehicle) devoted to the monitoring, routine cleaning (through the use of a custom mechanical manipulator), and 3D mapping of the submerged archeological structures.These specific tasks require the accurate localization of the vehicle within the operational environment and, above all, a clear representation of the underwater scene in presence of low visibility conditions.ROVs are usually equipped with different imaging devices both optical and acoustic.The acoustic systems typically give good results in long-range acquisition and do not suffer from the water turbidity, but the resulting data are affected by low resolution and accuracy.The optical systems, in contrast, are more suited for close-range acquisitions and allow for gathering high-resolution data and target details, but the results are constrained by a limited visibility range.Hence, the fusion of data captured by these two types of systems stands as a promising technique in underwater applications, as it allows for compensating their respective limitations.Since the two categories of sensors are based on different physical principles, they provide, in general, different information of the scene to be acquired, and different methods are employed to process the data.Therefore, the integration of the two types of data gathered from these sensors is a very promising and interesting field that calls for new solutions.Despite the difficulty of combining two modalities that operate at different resolutions, technological innovation and advances in acoustic sensors have progressively allowed for the generation of good-quality high-resolution data suitable for integration, and the related design of new techniques and systems for underwater scene reconstruction.The aim of this work is to describe the preliminary steps in the development of an opto-acoustic camera able to provide on-line 3D reconstructions.Taking full advantage of the benefits arising from the opto-acoustic data fusion techniques, the system was conceived as a support tool for ROV operators during the navigation in turbid waters or in operation conducted by means of mechanical manipulators.The operator will be able to choose the most suitable visualization technique (optical, acoustic) according to the working conditions or, if needed, to fuse both of them in a single image where the missing parts of the acquired optical data are covered by the acoustic acquisitions.
The remainder of this paper is organized as follows: Section 2 presents the state-of-the-art concerning the integration of optical and acoustic sensors in the field of underwater applications, and the solutions adopted to fuse the data.Section 3 deals with the system configuration and the hardware specifications of the sensors that constitute the opto-acoustic camera.In Section 4 we formalize the extrinsic calibration problem of the system and provide a methodology to solve it.The conclusive section presents the main features and a first prototype of the user interface.

RELATED WORKS
Multisensor data fusion is a technology aimed to enable the combination of information coming from several sources, in order to form a unified picture.

3.3
The

EXTRINSIC CALIBRATION OF THE SYSTEM
During the development of the proposed 3D opto-acoustic camera was the alignment of the optical and acoustic 3D data, defined as sensor calibration problem, which allows data from one sensor with the corresponding data of the other sensor.Data alignment, that is, their transformation from each local frame into a common reference frame for both sensors, is of critical importance for the successful deployment of the optoacoustic fusion system, since the 3D data generated by such sensors are so different from each other that they should be precisely matched in order to detect the spatial coordinates of a given point belonging to an object in both representations.
Up to now, the works presented in literature concerning the integration between several types of sonar (single beam sounder, multibeam, 3D acoustic camera) and optical cameras adopt a sensor fusion approach, which is mapping-oriented, according to the classification proposed in (Nicosevici et al., 2008).This means that the data acquired from the two sensors are described through geometric relationships (position and orientation), and the data fusion is performed by means of geometrical correspondences and registration.The alignment of 3D data is usually solved by performing the extrinsic calibration of the integrated system, i.e. by searching for the fixed -but unknown -rigid transformation which relates the local reference frame of the stereo optical subsystem with that of the acoustic camera, thereby obtaining the relative pose (position and orientation) between the two sensors.Now, let us assume that the two cameras have already been calibrated independently.In particular, with regard to the stereo optical system, the assumption is that both the intrinsic parameters of each optical (focal length, principal point, optical distortions and skew) and those of the extrinsic stereo pair (relative position between the two reference frames) have been determined.Downstream of this calibration process, it is possible to know the 3D coordinates of any point of the scene acquired by the optical subsystem with respect to its local reference frame (typically attached to the left camera of the stereo pair).Therefore, assuming that a point , , of the optical reference frame corresponds to a point x , y , z of the acoustic reference frame, the rigid transformation that relates the two coordinate systems may be expressed as: (1) where R = 3 x 3 orthonormal rotation matrix from the acoustic camera to the stereo optical camera reference frame t = 3D translation vector from the acoustic camera to the stereo optical camera reference frame Equivalently, indicating with and the homogeneous coordinates of the points and respectively, the aforementioned relation can be expressed as: where 0 1 = 4 x 4 homogeneous transformation matrix from the acoustic camera to the stereo optical camera reference frame Therefore, our goal is to develop a methodology for the calculation of the extrinsic parameters R and t, which define the orientation and the position of the acoustic subsystem against the optical one, respectively.Usually, the calibration methods are highly dependent on the sensors that compose the system and especially on the type of data they provide.In our case, the particular nature of the data acquired by the systems (3D point clouds) makes it possible to implement a methodology for extrinsic calibration based on a "direct" computation of the rigid transformation matrix that relates the reference systems associated with the optical and acoustic sensors.This is achieved through a simple registration of each pair of optical and acoustic 3D point clouds of a planar pattern, which is used as a target in the calibration procedure and acquired in different poses.Finally, the ability to obtain multiple estimates of the transformation matrix allows for implementing an appropriate optimization technique, in order to obtain more accurate results.The implemented methodology is composed of two separate data-processing threads that are related to the two sensory channels, which eventually merge in the last stages of the proposed solution.
Starting from the synchronous acquisition of the n poses of the calibration panel during the early stage of the process, the entire methodology is aimed to obtain n pairs of 3D point clouds, where the n-th pair is formed by the optical , and the acoustic , 3D point clouds, in order to calculate, by means of coarse and fine registration algorithms, n estimates , of the rigid transformation matrix.At the end of the process, the final transformation matrix * is obtained by processing the dataset composed of n transformation matrices , obtained downstream of the previous registration stage.

Acoustic image processing
The 3D image provided by the acoustic camera can be corrupted either by false reflections caused by the secondary lobes of the receiving array or by the noise present in the acquisition phase of the backscattering signals.The latter is modelled as speckle noise.The secondary lobes are responsible for the blurring of the object, while the speckle noise causes a low response or no response at all within the object itself.So it is evident that the operations of filtering (noise reduction and the elimination of possible outliers) and segmentation (differentiation of objects and background in the observed scene) are to be considered as preliminary and mandatory steps for the execution of all fusion algorithms to be applied to this specific type of data (Murino et al., 2000a).While in literature there are a number of algorithms for filtering and segmentation with variable results and degrees of automation (Murino, 2001b), the solution adopted in this calibration method for the processing of the acoustic 3D point clouds representing the calibration panel in its different poses, provides for a completely manual filtering and segmentation procedure, performed through the open source software CloudCompare (CloudCompare, 2014), as the implementation of an automated procedure would require further, more focused research.

Optical stereo images processing
This subsection describes the procedures for the creation of a 3D point cloud representing the optical calibration panel in its different poses, starting from the acquisition of a pair of stereo images.

4.2.1
Image enhancement Underwater images are generally affected by degradations caused by the attenuation of light during its propagation in water (mostly due to absorption and scattering), such as low contrast, uneven lighting, blurring,