A 3D BUILDING INDOOR-OUTDOOR BENCHMARK FOR SEMANTIC SEGMENTATION
Keywords: 3D Building Benchmark, Point Cloud, Mesh, Indoor Outdoor Dataset, Random Forest
Abstract. Both machine learning (ML) and deep learning (DL) algorithms require high-quality training samples as well as precise and thorough annotations in order to work effectively. The 3D building indoor-outdoor dataset (BIO dataset), which is a highly accurate, high level of detail, and high coverage dataset for 3D building point cloud and mesh semantic segmentation, is established as a canonical benchmark dataset. It contains 100 building models, in which building structural elements are annotated into 11 semantic categories. Each building in this dataset has an average of 75,587 triangular faces, and the total area of the dataset is 481,769 square meters. Furthermore, semantic segmentation of the dataset was carried out using the Random Forest ML algorithm to verify the dataset’s accessibility. A weighted F1 score of 96.64% was obtained with 10% of the segments of each building randomly chosen as training data. For applications involving building geometry data, the BIO dataset can support a broad class of recently developed ML and DL methodologies.