Mobile Phone Based Indoor Mapping

We presented a mobile phone scanning solution that offers a workflow for scanning not only small spaces, where drift can be neglected, but also larger spaces where it becomes a major accuracy issue. The LiDAR and image data is combined to build 3D representations of indoor spaces. The paper does focus on the drift compensation for larger scans on the mobile phone by using AutoTags detections. We show that those can also be used to combine scans from multiple independent scans.

One major difficulty with mobile phone-based scanning is drift, especially during longer captures, which results in the accumulation of errors in relative position and orientation.These errors render the results unusable for accuracy purposes.Such inaccuracies typically manifest as duplicated surfaces in point or meshing results.To manage drift in indoor scenarios, two common methods are used: the implementation of Ground Control Points (GCPs) or a self-calibration technique that utilizes plane alignment, see for instance (Chow et al., 2013)).Those are usually applied in an offline post-processing stage.
In this study we evaluate an interesting method that is based on the automatic detection of AutoTags as shown in figure 1.
• The AutoTags are detected in real-time during the data acquisition.
• Their position does not need to be known.
• The drift compensation is done on the mobile phone after finishing the data acquisition in seconds.
• The AutoTags are coded and can be used to define an internal coordinate system, that is used to align multiple scans to each other.
In sections 2, 3 and 4 we describe the first three points and discuss the potential for combining scans in section 5. Consumer devices, used in professional workflows, are intriguing due to their ease of use and affordability.Drones, originally designed for consumer use, are now integral to surveyors' toolboxes.Similarly, mobile phones offer an even simpler and cheaper option, allowing operation without the flight restrictions typically associated with drones.

Mobile Phone Tracking
PIXDcatch is a mobile phone application for iOS and Android that allows tracking the position and orientation of the phone in its minimalistic mode.The application uses the live video feed from the camera and tracks the phone with the help of phone sensors (accelerometers, GPS, and compass) and based on image features.The underlying technology is similar to the well-known Simultaneous Localization and Mapping (SLAM).See (Macario Barros et al., 2022) for an overview.
Since the computational resources on a phone are limited, SLAM optimization usually involves windowed bundle adjustment, where the optimization runs on a window of n subsequent images.Consequently, its global accuracy is limited, and the reconstruction may suffer from drift if the sequence of images grows large.However, the results can be obtained in real-time, and the accuracy of the relative positions and orientations can be considered reasonably precise.
Let p be the vector of positions and orientations p = (X, ω, κ, ϕ) = (X, Y, Z, ω, κ, ϕ) of the mobile phone camera center, i.e. the external camera parameters.Then, the SLAM procedure running in real-time inside PIX4Dcatch provide reasonable estimates for the relative change of positions and orientations △pi,i+1 between frame i and i + 1.
Given those relative changes in position and orientation, one can easily compute the position and orientation of a particular frame i: Mobile phone tracking is the first contribution that is available to compute the camera's external parameters pi.In the next section we are going to see how to integrate tracking with real time AutoTag detection.

Mobile Phone loop closure with AutoTags
The mobile phone tracking suffers from drift during long captures.AutoTags are an effective method to close loops and correct drift directly on the device.They can be detected in realtime during data acquisition (the catch) and considered together with the tracking information immediately after data collection is completed.There are 55 AutoTags available, two of them are show in figure 2. The tags can be placed at arbitrary positions and their 3D location does not necessarily need to be known.The ideal scenario is to place them at positions where one would regularly return, ensuring that loops in the data capture can be closed.
Let m ij be the 2D image location of the AutoTag in frame i, where j = 1..5 represents the four corners of the AutoTag and its center.Let Xj be the unknown 3D positions of the AutoTag's corners and center.Then, we can formulate an additional constraint on the camera externals pi.
with P(p i ) being the projection matrix that projects Xj into the image by using the current camera external parameters and the know camera internals.

Pix4D's Geofusion Algorithm
This algorithm runs on a mobile phone after data acquisition.Its purpose is to compute better external camera positions and orientations by integrating all available constraints, as described in Sections 2 and 3.These constraints are formulated into an energy minimization problem: with αt and αa being the weights both contributions, tracking and AutoTags.Equation 4 can be optimized efficiently on the mobile phone by non-linear optimization methods with respect to p, X.The solutions represents the optimal external camera parameters of all frames, with reduced drift if the AutoTags are placed such that loops on the acquisition are closed.

Validation
To validate the minimization approach described in Section 4, we conducted an indoor loop test using two mobile phones with identical settings, except that one phone utilized real-time AutoTag detection while the other did not.Consequently, one phone had its camera positions corrected by minimizing Equation 4, whereas the other relied solely on the tracking method detailed in Section 2. Both projects were set up in PIX4Dmatic with all AutoTags manually marked.No photogrammetry processing was applied in PIX4Dmatic, as the objective was to solely assess the Geofusion algorithm.The re-projection errors of the AutoTags were analyzed based on the camera externals provided by Geofusion, with results presented in Table 1.By integrating AutoTags into the Geofusion optimization, we significantly reduced drift, with the average re-projection error decreasing from 41 pixels to 7 pixels.This improvement was especially notable in areas with substantial initial drift, such as the locations of the first AutoTags.
With this result, the external camera positions are significantly improved, allowing the full photogrammetry processing to commence from a much better set of external camera parameters.    1. Reprojection errors of the individual AutoTags with AutoTag enabled in Geofusion (left) and without (right), as well as the average for both cases (last row).

Combining multiple phone scans by AutoTags
The advantage of the AutoTags in substantially reducing drift is a key feature of PIX4Dcatch, enabling larger indoor scans.AutoTags can also be used to combine scans for larger projects.
As an example we show here two datasets: one is a 2000m 2 office space and the other a family house.For the office space, we placed a set of 55 AutoTags around the office and acquired 5 scans with AutoTags and improved external camera parameters.
As there is no GPS indoors, each of the 5 scans is in a different coordinate system.Although the 3D location of the AutoTags is unknown, we can use them to align all scans automatically, provided there are at least 3 common AutoTags between pairs of scans.
After capturing and performing geofusion, we conducted the full photogrammetric processing in PIX4Dmatic.The locations of the AutoTags -one of the outputs of the Geofusion, were used as manual tie points.The results can be seen in Figure 3.One can observe the optimized camera positions (represented by little green dots), the AutoTags with their optimized 3D locations (green circles), and the iPhone LiDAR point cloud.The reprojection error of the AutoTags is further reduced due to the full bundle adjustment, which includes automatically extracted tie points.
To obtain the full merged scan of all 5 projects, PIX4Dmatic offers the option to merge based on common tie point names, in this case, the AutoTags can be used.This process is performed sequentially.After the final merge, we optimized the entire image block via a final bundle block adjustment, used the camera positions to recompute the iPhone LiDAR point cloud, and conducted a photogrammetric dense matching.The result is shown in Figure 4, consisting of about 5000 iPhone images and LiDAR scans.This figure also indicates the size of the office space, which is about 60 × 42 m.
The second combined scan was conducted in a family house, where data from both the exterior and various floors inside were collected.A total of 15 PIX4Dcatch scans were merged in PIX4Dmatic after each scan was processed individually.The result is shown in figure 5.One can see six screenshots of PIX4Dmatic showcasing the final point cloud in different cutting planes.

Conclusion
In this article, we presented a mobile phone scanning solution that offers a workflow for scanning not only small spaces, where drift can be neglected, but also larger spaces where it becomes a major accuracy issue.It is based on combining mobile phone tracking with the use of AutoTags.We showed that these can also be used to combine scans from multiple independent projects.Future work will focus on including reference measurements and ground truth to fully assess the accuracy of the presented method.

Figure 1 .
Figure 1.iPhone with PIX4Dcatch.The acquired LiDAR data of the phone can be overlayed to the current image, providing a real-time feedback on the completeness of the acquisition.AutoTags are detected in real-time (here indicated in blue.

Figure 3 .
Figure 3. Screenshots from PIX4Dmatic show a 2000 m 2 office space from a top view across five different scans.In each image, one can observe the AutoTags marked in green and the iPhone LiDAR point cloud, which is colored with the images.

Figure 4 .
Figure 4. Screenshots from PIX4Dmatic of the merged datasets from Figure 3 are shown.One can observe the AutoTags in green, the 5000 camera positions as small green dots, the iPhone LiDAR point cloud colored with the images, and the dense photogrammetry point cloud.

Figure 5 .
Figure 5. Screenshots from PIX4Dmatic of a house from 15 different PIX4Dcatch scans that have been merged with AutoTags.It is evident that AutoTags facilitate the modeling for both indoor and outdoor environments.