INTERACTIVE CAPTURE AND LABELLING OF POINT CLOUDS WITH HOLOLENS2 FOR SEMANTIC SEGMENTATION

: Low-cost capture systems allow faster data acquisition compared to other systems, although obtained resolution is lower. These systems can be used to acquire low-quality point clouds to classify with simple segmentation algorithms. In this work, we use Microsoft HoloLens 2 sensors to scan indoor 3D environments and hand tracking to position label tags that will support the point cloud segmentation based on heuristic algorithms. Thus, from each tag, several algorithms have been designed to segment doors, windows, columns, walls, ceilings, and floors. The method was tested on 3 real case studies, obtaining F1-Score between 0.61 and 0.96 depending on the object class. These results reinforce the idea that, by combining Mixed Reality with basic point cloud processing algorithms, low-quality point cloud data can be correctly processed without resorting to complex Artificial Intelligence techniques and without labelling large amounts of samples.


INTRODUCTION
The use of Virtual and Augmented reality systems has grown in recent years and their applications go beyond what was expected in their design, allowing many different uses as medical, industrial, and civil engineering applications (Park et al., 2021).In Virtual Reality (VR) a space is created without any element of real space, while in Augmented Reality (AR) the real space is superimposed with virtual elements.Mixed Reality (MR) and eXtended Reality (XR) are a combination of virtual and augmented reality technologies (Bahri et al., 2019;Guo & Prabhakaran, 2022), enabling the user to simultaneously interact with both virtual and real objects.Some authors have also considered Mobile Augmented Reality (MAR) where MR is combined with other devices such as smartphones, and tablets to visualize real and virtual objects directly on a screen (Kharroubi et al., 2020).
Microsoft launched HoloLens 1 in April 2016 and HoloLens 2 in November 2019.Microsoft HoloLens is a head-mounted MR device.HoloLens 2 (Microsoft 2023a) has a measurement range of 3.1 meters to capture point cloud data, which is a limited range to measure as-built buildings and infrastructures.However, HoloLens allows capturing data in a short time and in real time to visualize on the web through the Windows Device Portal.HoloLens has less resolution than other devices, such as Leica Scanstation P30 which is a Static Terrestrial Laser Scanning (TLS), so the data size is also smaller and easier to share or save (Bahri et al., 2019;De Geyter et al., 2022;Hübner et al., 2019Hübner et al., , 2020a;;Khoshelham et al., 2019;Navares Vazquez et al., 2023).One of the applications of HoloLens is its use to analyse and model data from parts of buildings.For example, Bahri et al. have mentioned BIM Holoview (https://www.bimholoview.com/),a program that allows to create 3D models in MR, these authors have also developed their own BIM system using HoloLens.Although other authors have acknowledged that building models require further research, as Hübner et al. have pointed out.In 2022, a consensus on a method to measure meshes captured with HoloLens had not yet been reached.However, the viability of results and low-cost data makes that such methods for measuring with HoloLens can be expected soon (Zhao et al., 2022).
In this work, we extend Reality Mesher application.It is an application developed by Navares Vázquez to be used in Microsoft HoloLens 2. The application includes scripts of Mixed Reality Toolkit (MRTK) library (Microsoft 2023b) for interaction in virtual reality, and it also contains open code of Unity to improve the user graphical interface.This application consists of launching virtual tags that serve as seeds in algorithms for semantic segmentation of indoor point clouds, considering the classes wall, floor, and ceiling (Navares Vazquez et al., 2023).
Reality Mesher application (Navares Vazquez et al., 2023) has been extended with the column, door, window, and stair classes.We focus on specific methods for class segmentation.These methods have been developed considering the low-quality labelled data captured with this low-cost system.The innovative method consists in using simple functions starting from the initial points defined by the tags.
The rest of the document is organized as follows.Section 2 reviews recent literature on the use of MR systems for point cloud capture.Section 3 describes the proposed method, while Section 4 presents and discusses the results of implementing the proposed method in real case studies.Finally, Section 5 is devoted to concluding this work.

REVIEW OF RELATED WORK
Recent works about Microsoft HoloLens have studied the potential use of these low-cost systems to measure indoor environments, capturing the scene in form of triangle meshes.Most of these works have analysed point cloud data results by comparing them with other systems in terms of resolution, scanning time, etc (De Geyter et al., 2022;Díaz-Vilariño et al., 2022;Hübner et al., 2019Hübner et al., , 2020a;;Weinmann et al., 2020).For example, Weinmann et al. have analyzed the acquisition data with HoloLens and with a terrestrial laser scanner, Leica HDS6000, investigating the quality and the geometric features of the resulting point cloud of both systems.
Besides the capture of indoor data with Microsoft HoloLens, this system can be used to capture data outdoors with the limitation of a maximum measurement range of 3.1 m given by the depth sensor.It is desirable to close the trajectories to capture data indoors and outdoors.It is also recommended to complete several sequences to cover more distance and to collect more data (Chandio et al., 2022;De Geyter et al., 2022), besides to avoid as much as possible the drift effects (Hübner et al., 2019(Hübner et al., , 2020a)).
Recent research has used the low resolution of data obtained with HoloLens as an advantage for fast data processing.Haitz et al. have trained a real time Neural Radiance Field.These authors have analysed image reconstruction and geometry reconstruction using data captured with HoloLens (Haitz et al., 2023).
Other specific works are those that have developed applications to load into Microsoft HoloLens to study semantic segmentation of point clouds.Point cloud segmentation techniques are often applied together, some of these techniques are RANSAC, region growing, DBSCAN, etc. (Ning et al., 2023;Oh et al., 2021;Singh et al., 2023).Navares Vázquez et al. have studied the segmentation of different classes using a region growing algorithm.Agrawal et al. have developed an application consisting of an interface to label different classes for subsequent segmentation.These authors have considered wall, floor, and ceiling classes, but also adding some classes of furniture such as table, chair, couch, etc. (Agrawal et al., 2022).Greyter et al. have studied semantic segmentation with several devices, including HoloLens.These authors have compared the results between several systems such as Leica Scanstation P30, NavVis M6, and NavVis VLX, besides Microsoft HoloLens 2 (De Geyter et al., 2022).Other works have studied segmentation techniques applied to HoloLens, such as voxel-based methods.Code and datasets are available on GitHub (Hübner et al., 2020b(Hübner et al., , 2022)).
Some authors have studied the potential use of HoloLens in several building processes, such as low-cost building assets tracking (Fan & Khoshelham, 2021).Some works have studied the segmentation of objects through hand gestures in real time in teleconferences to share information in meetings (Ishikawa et al., 2020.;Onishi et al., 2022).And other works have analysed segmentation of point clouds for the interaction of objects with an individual user employing HoloLens (Schütt et al., 2019).

EXTENDED APPLICATION
We use a loaded application in HoloLens to capture point clouds.This application allows the user to capture data and include tags virtually at the exact desired location in real space.The case studies are representative of the classes that are catalogued according to the level of information needed of the RecycleBIM international project (https://recyclebim.eu/).Specifically, we consider the following classes: wall, column, floor, ceiling, door, window, and stair.Once point clouds are measured, data are processed by simple segmentation algorithms created to deal with labelled low-quality point clouds.Figure 1 shows the workflow of the proposed method.

Reality Mesher application
The application is an interactive tool for real time labelling and visualization that can be used to capture point clouds of as-built buildings and infrastructures (Navares Vazquez et al., 2023).The classes defined in the application are wall, floor, ceiling, and other.The MRTK functionalities allow hand and eye tracking in the application.The tool captures up to three meters and continuously records the point cloud meshes that will be exported in a text format.The tool has several functionalities as saving data, taking a picture, etc, but the main one of these functionalities is to launch virtual tags in real time to label the different classes of indoor scenes (Figure 2).These tags will serve as seeds in point cloud semantic segmentation.And this can be used to classify point clouds using algorithms developed in Python code.These algorithms used for element segmentation will be simple functions considering low-quality of data acquired with HoloLens.Reality Mesher application is available on GitHub (https://github.com/JucaNavazReque/Reality-Mesher).

Extended application of Reality Mesher
In this work, we add to the Reality Mesher application the classes not considered in the previous version of the application.We add to the user menu the classes: column, door, window, and stair.Therefore, the data acquired by the HoloLens will be a point cloud with seven types of tags (window, door, ceiling, floor, column, wall, and stair).As previously stated, the quality of data captured with MR systems is much lower in comparison with dedicated laser scanners.However, the possibility to interactively set tags in the point cloud while capturing greatly helps the recognition of elements in the point cloud.It should be noted that this paper does not aim to develop any sophisticated segmentation algorithm but to demonstrate that low-quality point clouds captured with HoloLens can be successfully segmented using simple algorithms taking advantage of the tags.The simple algorithms we implement are described hereafter: • To segment door and window classes, a bounding box is created from the coordinates of the tags settled for each window and door identified during the acquisition.Points inside each bounding box are classified according to the corresponding tags.

•
For wall, ceiling, and floor, we implement the Random sample consensus (RANSAC) method, which is a common method used for plane segmentation, (Ning et al., 2023;Oh et al., 2021;Singh et al., 2023).For this purpose, the launched tags are selected as seeds for the calculation of neighbours using a KDTree function.Then, the plane containing each neighbourhood is obtained, and after that, the wall, ceiling, and floor points are segmented from the plane obtained.
Given the low quality of datasets captured with HoloLens, the distance point-to-plane threshold ranges between 0.1 and 0.3 meters.

•
To segment the column class, the column tags obtained during the acquisition were taken as reference.KDTree algorithm was applied to obtain the neighbours of this point.Afterwards, the radius of the column is calculated and used for segmenting the column points.

•
The trajectory data obtained during the acquisition is used to segment the stair class.A height of 0.1 m is considered for the steps.Subsequently, according to the angle formed by the vectors defined by consecutive points, the stair class is segmented.

RESULTS
This work shows the results of the developed method for several simple case studies consisting of two laboratories (Case Study 1 and 2) and one corridor (Case Study 3) from an academic building.To quantify the accuracy of the method, the results obtained with the semantic segmentation were compared with ground truth (Figures 4-5).The ground truth was manually generated using Cloud Compare software.Several metrics were performed to measure the quality of the results.Precision (Equation 1), Recall (Equation 2), F1 (Equation 3), and Intersection over Union (IoU), also called Jaccard index, (Equation 4) were calculated per class to evaluate the quality of semantic segmentation.

𝑇𝑇𝑃𝑃 𝑇𝑇𝑃𝑃 + 𝐹𝐹𝑃𝑃
(1) Where TP refers to true positives, FP to false positives, and FN to false negatives.
The method was tested in two laboratories (Case Study 1 and Case Study 2).Table 1  Regarding metrics, it is observed in both case studies for the window, ceiling, and floor classes that good results are obtained.
In particular, the values obtained for the Recall indicate that there are very few false negatives for these classes, indicating that each element has been completely segmented in these classes.It can also be observed, considering the Precision, that few false positives are included in the segmented class, indicating that these classes are correctly segmented without over-detecting other classes.In the case of the window class, the few misclassified points are confused with the wall class, given the proximity of both classes.These results are obtained correctly despite the low-quality of the available data.Specifically, a threshold of 0.2 m was chosen due to the low quality of the data, when applying the RANSAC method for ceiling and floor.
The Precision and Recall values obtained for the wall class are similar to those obtained for the ceiling and floor classes.The Precision and Recall show results above 80%, although some points of the wall class were misclassified (Figure 6.a).The results obtained for the floor and wall classes are similar to those obtained in the previous case studies.The value obtained for the Recall in the door class again indicates the sensitivity of tag positioning.
The column class has a high Precision value (0.96), indicating that there are no over-detected elements for this class.However, the value of Recall is lower, indicating that the sensitivity of tag positioning affects semantic segmentation, as with smaller surface elements, such as the door class.
The Recall indicates that the stair class is correctly segmented, a Recall value of 0.93 is obtained, however, the Precision indicates that some elements have been segmented that do not correspond to the stair class.
Tags positioning and the generation of the ground truth affect the results obtained in the metrics, as both point clouds are being compared to obtain the metric values.The errors can be explained by the difficulty in manually segmenting the point cloud to create the ground truth due to low-quality data (Figure 8.a).In addition, some other errors can come from an erroneous settling of tags (Figure 8.b).Although the ground truth corresponds to the manually segmented point cloud, more reliable data would be desirable as ground truth.

CONCLUSIONS
The objective of this work was to segment the classes from the tags, which are provided by the Reality Mesher application.
Although data captured with low-cost systems such as HoloLens is generally low-quality, the possibility of interactively assigning tags during the capture process brings the advantage of using very simple segmentation functions with good performance.The geometry of the point cloud and the tags obtained during the acquisition were used to obtain column, window, and door classes.The RANSAC algorithm and the tags were used to segment ceiling, floor, and wall classes.And the trajectory data obtained during the acquisition was used to segment stair class.
Ceiling and floor classes are the better results, obtaining the Precision higher than 75% and the Recall higher than 99% in two of the three case studies.
Figure 3 illustrates a piece of point cloud with some tags.

Figure 1 .
Figure 1.General workflow of the method.a) Point cloud captured with HoloLens2, b) tagged labels, c) segmented point cloud.

Figure 3 .
Figure 3. Data acquired with HoloLens 2 including several tags.3.3 Semantic segmentation of point clouds Afterwards, point clouds are directly segmented into eight classes, window, door, ceiling, floor, column, wall, stair, and other.Tags taken during the scanning are used as seeds.Four tags define the corners of each door or window, and one tag defines each of the remaining elements, wall, ceiling, floor, column, and stair.

Figure 6 .
Figure 6.Case Study 1. a) Column 5 with false negatives.Walls with false positives.b) Zoom in to Column 5. c) Zoom in to Column 1.

Figure 7 .
Figure 7. Case Study 2. a) Ground truth labelled with a missing column label.b) Correctly segmented columns.The method was also tested in Case Study 3. Window and ceiling classes are not represented in the point cloud in Case Study 3. In contrast to the previous case studies, Case Study 3 includes the stair class.Table3shows the metrics obtained.

Figure 8 .
Figure 8. Case Study 3. a) Ground truth.Doors and walls errors.b) Segmented point cloud with erroneous settling of door tags.

Table 1 .
and Table2show the values obtained for the different metrics calculated.Metrics of Case Study 1.

Table 2 .
Metrics of Case Study 2.
Table 3 shows the metrics obtained.

Table 3 .
Metrics of Case Study 3.
The Precision values of window and door prove that tags are important information to obtain these classes.The Precision and Recall values of column class indicate that a good positioning of the tags is necessary to obtain the anticipated results.Although the data has low-quality, segmented point clouds correspond to the expected results.Further research could expand the current study by incorporating more building elements, such as railing, and beam classes, in order to provide more complete data.029193-Ifundedby MCIN/AEI/10.13039/501100011033 and FSE "El FSE invierte en tu futuro", by grant ED431F 2022/08 funded by Xunta de Galicia, Spain-GAIN, and by the projects PID2021-123475OAI00 funded by MCIN/AEI/10.13039/501100011033/FEDER, UE, and PCI2022-132943, funded by MCIN/AEI/10.13039/501100011033and by the European Union "NextGenerationEU"/PRTR.The statements made herein are solely the responsibility of the authors.