OFF-THE-SHELF VIDEOGRAMMETRY - A SUCCESS STORY

Since the time Brown introduced the concept of self-calibration, it was known that there was no impediment in using consumer grade devices for metric purposes. Today, dSLR cameras are knowingly the standard photogrammetric tool in applications when time is not an issue, thus images can be taken sequentially. Nonetheless, albeit available with standard video signal, there has been little interest in applying them to observe dynamic scenes. In this paper we present a methodology to use dSLR cameras for shape and motion reconstruction at frequency of 30Hz. Particular focus is put on calibration and orientation issues, in static and dynamic cases i.e. cameras also undergoing a change in position during the measurement. Performance of the system was validated with results obtained by a system of superior quality.


INTRODUCTION
Image engineering, or in other words, close range photogrammetry for custom-made solutions has seen a line of development in recent years (Maas, 2008).As the sensors do not get in contact with the measured object, operate in a rapid fashion, for a desired period, and at a desired scale, photogrammetric practices find enthusiasts across many application fields.Thanks to redundant acquisitions, the applied methods are backed by precision estimates and reliability measures, and hence become a target when quantitative (geometric) evaluation is of interest.The trend that brings photogrammetry close to novel and individual approaches is clearly reflected in the range of publications covering many domains.Typical examples are: close-range mapping with unmanned aerial vehicles (UAV) (Remondino et al., 2013, Schneider et al., 2013), mobile mapping (van den Heuvel et al., n.d.), recording of cultural heritage (El-Hakim et al., 2007), human motion analyses and a long array of industrial applications.For instance, development and testing in aerospace industry (Shortis and Johnston, 1996, Pappa et al., 2002, Meyer, 2005), quality inspections in automotive manufacturing and renewable technologies (Bösemann, 1996, Shortis and Johnston, 1996, Mostofi et al., 2012), in ship industry for reverse engineering (Menna and Troisi, 2010) and in construction (Lin et al., 2008) for online quality control, robot guidance, as-built monitoring surveys, or material testing (Maas and Hampel, 2006).The spectrum of applications is wide, and so is the spectrum of approaches.As a general rule, on-line systems, a.k.a real-time, are preferred to observe dynamic events, and when immediate results are expected.Employed sensors include smart or machine vision cameras.According to (Maas, 2008), the latter are defined as cameras accompanied by a host computer whereto the data-streams are directly written, e.g.Proscilia GE, PCO Dimax, GOM ARGUS 5M, AICON MoveInspect HR.On the contrary, smart cameras are stand-alone devices (as well as dSLRs), integrate on-chip processing units and often return only dimensional coordinates rather than raw images, e.g.Optotrak Certus, Qualisys Oqus.In either case, the market offers a good selection of on-line systems in terms of varying spatial and temporal resolutions.The technology is mature and automated to the degree that no expert knowledge is necessary to operate it.Unfortunately, the prices are correspondingly higher in comparison to systems that will be discussed in the following paragraph, while the accuracies worse due to limited redundancy.Off-line systems are the appropriate choice when the scene is static or almost-static, with the rate of change smaller than that of subsequent image acquisitions, and the immediate results are not required e.g.monitoring of a dam, reconstruction of a cultural monument.The principal tool of off-line systems are professional stand-alone digital single-lens reflex (dSLR) cameras (e.g.Nikon D3x, Canon EOS-1Ds), available from the consumer market (Bösemann, 2011).dSLR cameras are valued for flexibility and reasonable price but because the devices are not inherently built for metric purposes, they lack mechanical stability, be it the fixing of the sensor plane w.r.t. the housing of the camera.Still, recognizing the caveats, understanding their physical cause and consequence is the key to a successful i.e. high precision, measurement.The great ally of offline applications is the time.It allows for careful survey planning and capturing the scene with a favourable network of images thus ensuring fine point distribution, decent intersections, and recovery of instantaneous camera calibration parameters.When combined with coded targets, the reconstruction process can be reduced to a few mouse clicks.Consequently the need for repetitive retrieval of camera interior orientation thru self-calibrating is nowadays viewed as a routine background task rather than an additional effort (Fraser, 2012).(Maas, 2008, Luhmann, 2010, Bösemann, 2011) unanimously claim dSLRs not to be the right devices for dynamic observations.It is a fair conclusion considering the above limitations, and the fact that direct interfacing is impossible.Nonetheless, few scientists have struggled to prove it empirically.Most of the reported cases use consumer grade cameras in multi-exposure acquisitions at frequencies less than 1Hz (Benning et al., 2004, Koschitzki et al., 2011, Detchev et al., 2013).Much lesser interest is found in applying the cameras at higher frame rates, that is substituting single images for videos.(Nocerino et al., 2011).Presented work is an outcome of a collaboration undertaken between the GEO Department, IET and ILSB institutes of Vienna University of Technology.Low-cost videogrammetry was used as an experimental method to understand dynamic behaviour of a structure floating on the water surface.The challenge faced was (i) that the measurement took place in a professional ship model basin imposing harsh workplace constraints, (ii) quality of the captured video data was diminished due to lossly compression of the standard video format, (iii) cameras were not internally synchronised, (iv) both, the surveyed scene and the cameras' positions changed during the measurement.The paper is organized as follows.In the next section, motivation and theoretical background of the applied methods is given.System calibration, as well as orientation of static and dynamic cameras are in the main focus.Then, the discussion continues with the evaluation part.Performance of the low-cost videogrammetry is assessed against results derived from a motion capture system, operating at greater frequencies and better spatial resolution.Finally, the paper will close with conclusions and future works.

METHODS
In essence the online and offline data processing chains do not differ.They follow the same sequence of system calibration (interior and relative orientation), exterior orientation (if anticipated), and point intersection.The distinct feature of online systems is the number of employed cameras (ranging from two to tens) that work in sync, and thus the large number of frames to process.If cameras are assumed static throughout the measurement, it is a common practice to calibrate the system once, prior to the actual measurement, and treat the parameters constant during the measurement.Provided no interframe dependencies exist (point in current frame is independent of its position in former frame), the reconstruction task becomes largely inexpensive because only XYZ of the points in current frame are considered unknowns.

Imaging system
The imaging setup comprised of three dSLR cameras (Canon 60D, 20mm focal length) and three continuous illumination sources (1250 Watt).Spatial resolution of the videos matched the full HD (1920x1080), acquiring at maximum of 30 fps in progressive mode.The cameras were rigidly mounted on a mobile platform (cf. Figure 1), and connected with each other, as well as with a PC, via USB cables to allow for (i) remote triggering, and (ii) coarse synchronization.Nonetheless, the videos were stored on the memory cards.No spatial reference field was embedded in vicinity of the system, instead, the calibration and orientation was carried out with the moved reference bar method.Additionally, six scale bars were arranged along the model basin.

System calibration
By system calibration the authors refer to (i) interior orientation and lens distortion parameters as well as (ii) relative orientation of cameras in a multi-ocular configuration.Interior orientation comprises of principal point, principal distance, additional parameters (xp, yp, c, ∆x, ∆y), whereas the relative orientation performs rotation and translation of points from camera with embedded coordinate origin at the perspective center and the axes aligned with the sensor axes, to remaining cameras (R T , T ).See collinearity equation for better understanding: Metric quality of photogrammetric reconstruction is strongly dependent on the quality of the recovered calibration parameters.
The main developments in camera calibration formulated in terms of collinearity equation ( 1) happened in the 1970s and 1980s.
Brown was the first one to show how radial and decentring lens distortion can be effectively modelled within the bundle adjustment, later known as self-calibration.Already in 1956 he said that there was no impediment in using suitable commercial lenses in photogrammetry because there were means to correct for their imperfections.And so, the polynomial formulae of Brown model has been successfully adopted in close range photogrammetry throughout all these years (Kraus, 1997, Clarke and Fryer, 1998, Remondino and Fraser, 2006).Self-calibrating bundle adjustment is based on ray intersections of unknown 3D points, and optionally on scale information.When the latter is available, full set of interior parameters can be recovered.No object space constraints in form of ground control points are necessary, instead, inner or minimum constraints are enforced to remove the datum defect.In close-range photogrammetry the measured object often cannot serve as a calibration field capable of recovering reliable camera parameters, hence a temporary field is established to perform the on-the-job calibration.The strategy may however be cumbersome in multi-ocular online applications.
Moved reference bar method is then much more optimal because it avoids the laborious acquisition of multiple images, adopting varying roll angles, with every single camera.The moved reference bar method uses a calibrated bar, signalised with targets on both ends (cf. Figure 2a).The bar is randomly moved around the observation volume while the camera system tracks and records the positions of the two points in image space.Ultimately the system calibration is calculated in the bundle adjustment, preferably with free network, including the image measurements and length of the bar as observations.The merit of the approach is that (i) interior parameters of all cameras and relative orientation are restored in one procedure, and (ii) the complexity of finding correspondences between particular views is significantly reduced (Maas, 1998, T.Luhmann et al., 2011).

Target motion model
Point signalising in videogrammetric applications commonly employ circular, retro-reflective (active) or white (passive) targets, surrounded by black, matte rings.Passive targets are more susceptible to ambient lighting and attention should be paid that there is enough of contrast to separate the points from the background.The advantage of retro-reflective material is that when illuminated, it gives off a strong signal in the direction of the light source, which should also be the direction of the camera.Localizing the targets is then a trivial task as they are highly contrasted against the remaining image content.Once the points are localized, their centers are typically found with centroiding methods, ellipse fitting or correlation (Shortis et al., 1995, Otepka, 2004, Wiora et al., 2004, Burgess et al., 2011).
In tracking applications, physical environment often precludes a complete detection and localization of points.Firstly, freedom in applying targets and camera arrangement is hindered by workplace constraints.Secondly, as the observed object is dynamic, points may be occluded by other passing-by objects, or move beyond the field of view and be lost.Besides this, low-cost sensors are characterized by (i) lower resolution and (ii) diminished image quality as a result of lower pixel sensitivity and applied compression.From the standpoint of mensuration algorithms, it translates into a loss of image measurement accuracy or even a loss of a tracked point.
Our experience showed that the latter is not a rare scenario.At a distance of 10m from the camera, with the decreasing imaging angle (but within the accepted angle range given by the retroreflective sheet manufacturer), cross-correlation tracking was interrupted every few frames, whereas tracking based on thresholding techniques turned too unreliable in face of the poor targets' response.The situation could probably be amended if the lights were deposited closer to the object of interest, however, workplace constraints would not allow for that.
To overcome the notorious loss of tracked points, we have modelled the motion of points in image space and implemented Kalman Filter (KF) to detect their anomalous behaviour.The anomalous behaviour meaning: (i) wrong centroiding due to glittering effect, (ii) loss of points due to temporary lack of illumination (point facing away the camera, (iii) loss of points due to temporary occlusions (cf.Fig. 2).
Since the arrival of the KF in 1960, it has found applications in many fields, most notably in process control, navigation and tracking.KF is a data processing algorithm that recursively estimates a time-controlled model.Put differently, it is a weighted least squares estimate of the actual model, estimated upon a vector of real measurements Z (direct,indirect) with their uncertainties and a vector of measurements predicted by the current model state (3), also including its uncertainties.In mathematical terms there is a time-update equation ( 2) that projects the current state X(k−1) and error covariance P (k−1) forward to an a priori estimate, and a measurement-update equations that serve as feedback to upgrade the a prori estimate to an a posteriori one ( 6), ( 7) .In this predictor-corrector mode of operation, the filter is said to produce a result that is the maximum likelihood estimate.The scope of the paper does not cover detailed mathematics of the Kalman filter, for that the reader is refered to (Maybeck, 1979, Welch and Bishop, 1995, Mehrotra and Mahapatra, 1997).KF describes the model's state process with a descret linear difference equation (time-update): predicts the future measurement on its basis: and projects the error covariance forward: where A is the transfer matrix, H the measurement matrix.The w k and v k are assumed uncorrelated, normally distributed.The most optimal model estimate ( 7) is a linear combination of the predicted state and the difference between a real observation and its model prediction (innovation) weighted by the Kalman gain (5).The gain minimizes a posteriori error covariance.
Since we allow our image points to (i) manoeuvre freely, (ii) accelerate and (iii) decelerate, the motion is described using a jerk model.The state vector consists of 2D position, velocity, acceleration and jerk: H = 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 (10) To use KF as a remedy for spotting bad data, we inspect the innovation values.When the discrepancy between the model prediction and actual measurement is above a certain threshold, the measurement is either discarded, or replaced with the prediction for continuation purposes.Since KF works as an interpolator, we use it exclusively for detection of erroneous behaviour.We do not "correct" our measurements with the computed estimates.

Dynamic referencing
At times the constancy of orientation elements is violated and cameras must be dynamically referenced to be able to align measurements from different epochs.Typical circumstances are (i) when the measurement volume must be enlarged during the measurement to compensate for the movement of the measured object, or (ii) when the working environment conditions such as vibration affect the camera system position (image-variant interior orientation not reviewed hereafter).Rotations and translations of the cameras are then considered unknowns, or observed unknowns.Their continuous restoration is possible when (i) a reference body is placed in the object scene, (ii) any static wellidentifiable objects are present in the object scene, (iii) parameters' differences are observed by external sensors.The beauty of bundle adjustment permits then to combine all this information and output most optimal estimates of current camera positions.When a reference body or external reference frame is available, transforming the measurements to the same datum can be computed via (i) 2D-3D resectioning in monocular measurements, or in the multi-ocular case as (ii) sequential 3D-3D spatial similarity transformation and (iii) 2D-2D-...-3D simultaneous bundle adjustment.In either scenario the prerequisite is that the minimum of 3 static points can be observed (Kager, 2000, Wrobel, 2001, Bösemann, 2011).
In analysing image sequences from multi-ocular systems, the number of unknowns is desired to be kept small.Also, designing a reference frame is often not feasible due to working conditions, or simply not wanted for the additional effort.In our experiments the employed three-ocular imaging setup to observed a sequence of object's (OBJ) motions.The object was defined as non-rigid and placed in a 10m wide water basin.The system was placed on a bridge across the basin, at a distance of ca.10m from OBJ (cf. Figure 1).To maintain the distance constant throughout the measurement, the bridge was moving according to OBJ motion (forward and backward).The task was thus to discern between the OBJ and camera motions (cf. Figure 3,4).Because (i) possibilities to include control information in proximity of water were limited, (ii) one of the cameras could not see any control information (cf. Figure 2), (iii) the system of cameras was regarded rigid, we resolved the motion of cameras as a series of sequential 3D-3D orientations.The following workflow was adopted: xy -image space; xyz -local frame (before motion correction); XYZ -global frame (after motion correction); GCP -control information; input : images, self-calibration output: 4D points (XYZ in time) Algorithm 1: Pseudocode of dynamic referencing.
Self-calibration parameters and images extracted from videos constitute the input to the algorithm.Whenever a change in control information in image space is recorded, and later confirmed by sufficient cues, points intersected at this instant are regarded local, and are attributed to a 3D local coordinate system -MODEL (cf. Figure 4 b).A MODEL is related to the global frame by a spatial similarity transformation (Equation (1) with c = −z, λ = 1, ∆x = 0, ∆y = 0), hence it stores respective reference parameters and a collection of 3D observations.To "subtract" the camera motion from the total motion, and obtain the global object points, least square adjustment is carried out (cf. Figure 4c).The basis for the transformation is the static, control information registered in both systems.Figure 3

EVALUATION
The measured object was a 4m x 4m leightweight platform, marked with 144 retroreflective targets.It was suspended above the water surface thanks to four submerged air cushions.The experiments were performed in two series when (i) regular and (ii) irregular waves were induced.Dynamic referencing was adopted only in irregular wave conditions.Ultimate object precision amounted to 4mm (mean value), corresponding to a relative accuracy of 1:2500, at average image scale of 1:500.In parallel to the operation of the low-cost system, the platform was observed by an online motion capture system, of superior spatial and doubled temporal resolution.The two systems used independent targeting impeding their direct comparison.In ship hydromechanics, analysing dynamics of bodies and fluids in regular waves is usually described in terms of ship motions.The motions can be split into three mutually perpendicular translations around a COG, and three rotations around respective axes (rendered in Figure 5a).The axes convention is: X axis in the direction of wave propagation, Z pointing upwards, perpendicular to still water surface (Journe and Pinkster, 2002).With this in mind, the accuracy of the results (i) in regular wave conditions (static) were validated using the calculated ship motions (cf. Figure 5), and (ii) in the irregular spectrum (dynamic) the reconstructed scale bars were compared with their nominal values (cf. Figure 6).As a preprocessing step in comparing the ship motions, the lowcost signals were upsampled to match the frequency of the motion capture system, and cross correlated for fine synchronisation.The evaluated translational part agrees to a high degree.The sway values come the closest to each other, while the others two differ more significantly, nonetheless within the range of precision given by the adjustment output.The reason for this happening is unfavourable imaging configurations: cameras disposed along the Y axis, with little offsets along X and Z. Roll and pitch (no significant yaw in XY plane) from the rotational part are consistent with up to 18% and 6% of the maximum amplitude respectively.The rather large departure in roll is probably due to the misalignment of the fitted planes into raw measurements of the two systems (basis for transformation to parallel coordinate systems, Figure 5).Contrary to the translations, which are computed exclusively based on 3D centroids, the rotations are more sensitive to inexact alignment of two coordinate systems.The authors however did not verify that.With respect to the dynamic referencing, the accuracy evaluation is also optimistic, and within the range of adjustment precision.The constant trend suggests that the system does not drift as the motion proceeds.

CONCLUSIONS
The work demonstrates a complete workflow to use off-the-shelf videogrammetry in industrial optical metrology.The outcome proves that the cameras have the potential to become measuring devices, yet their shortcomings must be realized.The portability of the system, as well as flexibility in targetting and the form of the delivered results were appreciated by the project partners.Noteworthy, if the system was to be set up to be operational for non-expert users, the flexibility would be in large part reduced.Adopting the method of moved reference bar to calibrate the system at once, ousted the laborious image acquisition involved in individual calibrations.Random bar arrangements within the measurement volume proved feasible of recovering instantaneous interior orientation and distortion parameters.The diminished image quality resulting from lossy compression (H.264) of standard video format turned to be an obstacle in continuous tracking in image space.To mitigate the problem, the target motion model was introduced and modelled in time with the Kalman Filter.Due to the recursive nature of the filter, the additional computations involved did not slow down the tracking process.
In evaluating the reliability of the obtained results, accuracy, rather than precision, was of primary interest.The static and dynamic referencing using spatial similarity transformation gave viable results.In the latter case, the trend of evaluated scale bar lengths indicates that no drift of the camera system exists as the motion proceeds.It is essential to note that appropriate distribution of control information, at best encompassing the object volume, is a prerequisite in obtaining reliable outcomes.Future works involve more in-depth investigations of the achieved results, as well as forming best practices to handle off-the-shelf videogrammetry in optical metrology, keeping the accuracy as the highest priority.

Figure 1 :
Figure 1: Vienna Model Basin.The central part is occupied by the observed platform.Camera system is placed next to the lamps.

Figure 2 :
Figure 2: Reasons for lost tracking; a) temporary occlusions, b-d) point's normal moving away from camera's Line of Sight.

Figure 3 :
Figure 3: A set of reconstructed points (4D), in orange before and in black after the motion correction, cameras moving away from the object.The close-up figure: trajectory before motion subtraction overlaid with the corrected one; trajectories split up when cameras moved.

Figure 4
Figure 4: referencing, a) camera motion and object motion occured, b) point reconstruction in the local camera frame, c) points after the spatial similarity camera motion correction

Figure 5 :
Figure 5: Evaluation of the results with static camera system; in red low-cost system, in black motion capture system a) definition of ship motions, b-c) rotational part, d-f) translational part of the motion.

Figure 6 :
Figure 6: Evaluation of the results with dynamic referencing.The diagrams show discrepancies of three scale bars (top,middle,bottom) between their reconstructed and their nominal values, over duration of 70 frames.The lengths are given in mm.