IDENTIFICATION OF RELEVANT POINT CLOUD GEOMETRIC FEATURES FOR THE DETECTION OF PAVEMENT CRACKS USING MLS DATA

: The maintenance of road infrastructures is one of the main challenges that transportation authorities must face to guarantee the safe mobility of people and goods. Novel remote monitoring technologies offer advanced solutions for this issue, allowing the inspection of large sections of the network in a time-effective way. In this paper, we introduce a methodology for the detection of cracks on road pavements using point clouds acquired with a mobile laser scanner. First, the points of the cloud are labelled as pavement or cracks based on field annotations, and local geometric features of the points are calculated using principal component analysis. Two different machine learning classifiers, Support Vector Machine (SVM) and Random Forest, are then trained to identify crack points using the point feature data. The crack points predicted by the classifiers are clustered as individual instances and compared to the corresponding ones from a test dataset. Although pointwise performance of the method is modest, it can correctly identify and measure areas of the pavement affected by cracking.


INTRODUCTION
Road transport infrastructures play a crucial role in modern societies, providing the backbone for transportation of goods and people and therefore greatly contributing to development and economic growth.One of the most critical challenges that road infrastructures are facing is their advanced age, which establishes the need to develop innovative technologies of inspection and maintenance.Traditionally, road condition monitoring has been based mostly on visual inspections carried out by qualified personnel on-site [1], which involves high execution times, costs and arise concerns about work safety.Modern remote monitoring techniques offer however solutions to most of these problems.Regarding the condition monitoring of road pavements, the identification and rating of distresses on their surface is one of the primary concerns of administrations and infrastructure managers, so the use of remote monitoring solutions, capable of acquiring big amounts of data in a comprehensive and fast way, is vital.The most used remote monitoring technologies for road surveys include cameras and LIDAR (Laser Imaging Detection and Ranging), so both 2D and 3D representations of the areas of interest can be obtained.
Our focus in this research work is on pavement cracks, which are a type of distress that appears very commonly in roads, especially on those subjected to heavy traffic loads.Most of the available research about pavement crack detection is based on the use of camera images.Usual crack detection methods include the use of morphological filters for crack edge detection [2], the application of wavelet transform on images to separate high-frequency components associated to cracks [3], deriving the minimum spanning tree from a crack seeds graph [4] or Gabor kernels with different orientations that align with the cracks [5].The detection of cracks in images containing 3D information, obtained via stereovision techniques, has also been studied [6], allowing to * Corresponding author analyse the elevation of the pavement, instead of relying only on intensity information.In recent years, the dominant trend in crack detection is the use of machine learning techniques for image classification, such as random structured forests [7], Support Vector Machine [8], Feature Pyramid and Hierarchical Boosting Network [9] or Convolutional Neural Networks [10] [11], some of them being able of achieving pixelwise accuracy on segmentation results, like CrackNet [12] and CrackU-net [13].
Laser-imaging techniques, also known as structured laser light, offer the most comprehensive solution for crack detection and measuring, as they combine images and range information obtained with laser profilers mounted on top of a survey vehicle, generating 3D data of the pavement surface.Laser profilers simultaneously project multiple laser beams on the ground, generally allowing to cover a full lane width with a separation of only a few millimetres between beams.To make cracks stand out on these 3D pavement images, a best-fitting surface can be adjusted so high-frequency components of the depth image, that are likely associated to cracks, are extracted.To do so, Ouyang et al. [14] used the Haar Transform, a type of wavelet transforms.Alternatively, Li et al. [15] used the Fast Fourier Transform to separate the high frequency components of each individual scanning line, obtaining potential crack points based on an elevation threshold.In a similar way, Zhang et al. used PCA (Principal Component Analysis) to calculate the elevation of small segments of points along individual scanning lines and generate a control profile, analysing saliency of points in order to detect the cracks.Other solution to detect cracks in 3D laserimages is the generation of crack probability maps obtained by path voting [16] or minimal path algorithm [17], [18].MLS (Mobile Laser Scanner) systems combine LiDAR sensors with navigation-positioning units to cover large areas of road quickly while acquiring comprehensive and accurate data of the road surface and their surroundings in the form of georeferenced 3D point clouds.They also allow to implement methodologies with a high level of automation and that can analyse large datasets is short times [19], [20].Compared to other remote monitoring technologies, MLS avoid typical camera problems related with variable illumination conditions or the presence of shadows, and they are more versatile than laser-imaging systems because of their ability to scan roads and their entire environment from different angles and directions.In any case, laser scanners can also be integrated with other sensors, such as cameras or thermal sensors [21], [22] to provide a more complete analysis of road surface conditions.The use of raster images generated from point clouds to detect crack points has been explored by various works, either by setting a threshold to filter candidate crack points based on point intensity information [23] or applying a high-pass filter convolution to detect local elevation changes [24], which could indicate the presence of cracks.De Blasiis et al. [25] proposed the generation of a digital elevation model of the road from the point cloud, calculating then the roughness to detect points deviating from this model.Evaluating individual scanning lines obtained by the MLS is also feasible for identifying deviating points related with distresses [26].
In this article, we introduce a methodology to evaluate the suitability of using MLS systems to detect pavement cracks based solely on point cloud geometry.Pavement cracks can be categorised in multiple ways, but they can be reduced to four main types: longitudinal, transversal, block, and alligator cracking [27].In our research, we focused on detecting longitudinal cracks, which are the most severe in our dataset, with narrower transversal and alligator cracks starting from them.Local geometrical features were obtained for all points to calculate valuable information that indicated the presence of cracks, following a point neighbourhood study approach.The most relevant features were then selected to train two machine learning models, Support Vector Machine and Random Forest, to classify the points of the cloud accordingly to the labelled train set provided, which was obtained by marking the cracks before the point cloud acquisition.

MATERIALS
Our case study consists of a stretch of 20m of pavement in the access lanes of a parking lot inside the campus of the University of Vigo.The whole area is affected by cracking and other types of distresses.We marked all of them with high-reflective paint, drawing a wide line along all cracks so they could be easily captured by the MLS due to the intensity contrast between the paint and the pavement.To assess the severity of all marked cracks, we followed the guidelines of the Distress Identification Manual of the US-FHWA [28], measuring every crack at different points with a calliper (Figure 1a) to calculate their main width and classify them according to three severity levels (low, moderate, and high).The most relevant distresses in this case study are longitudinal cracks of moderate-to-high severity and low-severity transversal cracks.
After marking the distresses, we conducted the survey of the area using a vehicle equipped with an MLS system to acquire the point cloud (Figure 1b).The scanner employed was a RIEGL VUX-1HA set up to its maximum point acquisition rate of 1000 kHz, in combination with a Trimble Zephyr 3 GNSS antenna and a STIM300 IMU (Inertial Measurement Unit) for the positioning and georeferencing of the point cloud.

PAVEMENT CRACK DETECTION
The workflow of our methodology goes as follows.First, a ground truth is stablished based on intensity data of the point cloud.We calculate then features for each point based on local geometry and select the most relevant ones.The last step consists on training two different machine learning models with the feature array so they can label the points as pavement or cracks.

Pre-processing
To obtain a ground truth of the cracks aimed to be detected by our algorithm, we adjusted the intensity channel of the point cloud to maximize the contrast between painted crack areas and the rest of the pavement, allowing the segmentation of these points.First, intensity was adjusted according to scanning angle values of the points, in order to homogenize the intensity between the centre and the edges of the pavement section, as it varies depending on the incidence angle of the laser on the ground (Figure 2).Next, we segmented the point cloud between crack and non-crack points, using a fixed threshold that was manually set.The point cloud was then rasterized to apply a Canny edge detector [29] to extract the edges of the cracks and remove noisy points, followed by a closing operation to completely segment the cracks.

Point feature selection
Our aim was to obtain a set of features that are relevant to identify the presence of cracks in the pavement based only on point geometry.Therefore, we calculated different features that are commonly used for emphasizing inter-class differences among points in the cloud.First, we determined the local neighbourhood of each point.A k-dimensional tree was constructed based on the  coordinates of the points, using the Euclidean distance to determine the  closest neighbours of each point.We applied PCA to obtain the eigenvalues ( 1 ,  2 ,  3 ) and eigenvectors ( 1 ,  2 ,  3 ) corresponding to each point neighbourhood, considering again only their  coordinates.We calculated 8 eigenvalue-based features as indicated by Weinmann et al. [30].Their equations are presented in the 8 first entries of Table 1.The variance of curvature among  neighbours [31], the verticality of the normal vector (i.e. the third eigenvector,  3 ) [32] and the variance in the normal direction, which equates to the third eigenvalue, were calculated as well.Additionally, we introduce the parallelism of normal vectors, which results from calculating the mean parallelism between the normal vector of a point and those of its neighbours.We also calculated the variance and standard deviation in elevation () among neighbouring points and the surface roughness, using both the arithmetic mean and the root mean square (RMS).
Feature Equation

Machine learning models training and testing
We used two types of machine learning classifiers for identifying cracks on the pavement section: a Support Vector Machine (SVM) and Random Forest.The inputs for the classifiers are a feature array   with  points/samples and  selected point features and a class label array  1 .We run binary classifications, with two available classes being 0 for the pavement or non-crack points and 1 for points belonging to cracks.
To detect cracks among the candidate points, we used the DBSCAN algorithm [33] to cluster crack points into individual defects and compute the intersection over union of detected and ground truth areas to assess the results.

EXPERIMENTS
Considering the density of the point cloud and the size of the cracks to be detected, different neighbourhood sizes were tested, from  = 20 to  = 200, selecting  = 150 as a satisfactory middle ground after experimentation, which equals to a search radius of around 5.5 cm at the centre of the road and 9 cm at the edge, as the density of points decreases when moving away from the survey vehicle.
To select the most suitable features to train the classifier, we first evaluated the distribution of each feature across the point cloud, so we could identify those that provided more useful information about pavement crack location and severity.To do so, we visualized the point cloud using the CloudCompare software, representing each time the colour of the points based on one feature to check the contrasts arisen between crack points and non-crack points, as well as the statistical distribution of the feature values.After analysing them, we selected the following list of features from Table 1: change of curvature, curvature variance, verticality, normal parallelism, standard deviation in the Z-axis and RMS roughness.In Figure 3 the same section of pavement point cloud from Figure 2 is represented multiple times, each of them based on the values of the selected features.Indications of the most severe cracks can be appreciated due to deviations in point elevation and their normal vectors orientations, mainly.We discovered that other features, like for example the planarity, did not allow to make the crack points stand out in their environment, or did it very lightly.The coordinates of the points were not included in the feature array, to avoid a point location bias when training the models.We trained the classifier models using 1 million points sampled from the 2,063,151 total points of the cloud.In order to be able to detect cracks starting from moderate-level ones, we assigned labels to the points as follows: pavement (0), which includes areas without distresses and low-level cracks, and cracks (1), including moderate and high severity ones.Other types of distresses were not considered.The feature and label arrays were split into training and testing datasets following a 80/20 proportion.
The SVM employed for classifying the points is based on a Radial Basis Function (RBF) kernel and a value of regularization  = 1.Regarding the Random Forest model employed, it was constructed with 100 trees/estimators.In both cases, classes were balanced due to the high proportion of pavement points in the dataset compared to crack ones, so the weight of the samples was adjusted inversely proportional to class frequencies.The point labels predicted by the classifiers were compared to those of the test set to assess its performance.
The two crack point sets respectively retrieved by each classifier were processed to cluster together related points into individual crack instances, using the DBSCAN algorithm.After projecting the point cloud into the XY plane, we defined Convex Hulls of these instances to compare those obtained through the classification models with the respective ones from the ground truth, calculating the intersection and union of these polygons to assess the feasibility of detecting individual crack instances.

RESULTS
The raw performance achieved by the SVM was moderate (Table 2), but within our expectations due to the nature of the data employed.This classifier can correctly identify points along most of the cracks considered, although it can be appreciated that retrieved crack clusters are incomplete (Figure 3).The crack located at the left verge of the road representing 19% of crack points in the ground truth, was ignored by both models (crack 0 in Figure 3).Our assumption is that point resolution was too low at that distance from the LIDAR scanner (around 4 m), so the algorithm could not identify saliency characteristics of the points.Limitations of our ground truth must also be noted, as cracks were roughly delineated based on the lines painted on the ground and recorded in the point cloud, which are wider than the cracks.The pointwise precision offered by this model was also low, mostly due to noisy points, so we used a mode filter to remove part of them, which improved general performance.
Results from the Random Forest classifier show much better precision, as it generates less noise, but at the expense of underestimating the presence of crack points, resulting in a poor recall value (Table 2).The crack points retrieved with this method are also spread along most of the length of actual cracks, so the identification of cracks is still feasible.In both cases, the DBSCAN algorithm was able to cluster together points from 3 of the 4 cracks present in the pavement section, except for the one placed at the verge of the road, as mentioned before.In Figure 4 the Convex Hull boundaries of each crack cluster from the ground truth set are displayed and compared to the corresponding ones predicted by the SVM and the Random Forest models, respectively.We also calculated the intersection over union (IoU) of corresponding cracks Convex Hulls between the ground truth set and each model.Due to the prominently narrow shape of the resulting polygons, we got their values of area and perimeter for calculating two different IoU parameters.This measure helps to assess the coincidence between ground truth data and predictions, because when comparing the boundaries of cracks in Figure 4 and IoU values in Table 3, it can be noted that slight differences in shape or width can lead to reduced area IoU, while perimeter values provide valuable information about region similarity.The crack #2 detected by the SVM illustrates this effect, as the presence of noisy points around the crack creates a wider boundary, so its area is much larger than the ground truth one, leading to a low IoU.However, their perimeter values are similar, as both represent the full length of the crack.Based on this, we can claim that the detection of moderate or high severity cracks in the vicinity of the MLS vehicle is feasible, as perimeter IoU values are over 70% in the SVM case and only for one of the cracks in the Random Forest case is lower than that.We discard the possibility of detecting cracks in adjacent lanes of the road [34]: the ones correctly detected are within 2.5 m reach at each side of the vehicle, while the crack located at 4 m was omitted by both models.

CONCLUSIONS
In this paper, we introduce a method for identifying cracks in road point clouds acquired with Mobile Laser Scanners.Local geometric features of the points are obtained by evaluating their neighbourhoods, using PCA to get their eigenvalues and eigenvectors so point dispersion and spatial distribution can be assessed.Then, two different machine learning models, Support Vector Machine and Random Forest, are trained to classify points of the cloud either as regular pavement or cracks.The training vector is constructed by sampling points across the evaluated cloud section and selecting their most relevant features, and the target values vector contains class labels assigned by marking crack affected areas on the ground.Crack points predicted by the models are clustered and extracted as individual cracks, so they can be measured against the ground truth set.
Although pointwise performance of the machine learning models is modest, they can identify enough crack points to correctly delineate and measure boundaries of pavement areas affected by cracking.Cracks located on lanes adjacent to that of the survey vehicle are the exception, as point resolution of the scanner decreases with distance.In short, it is possible to detect pavement sections severely affected by cracks using MLS data, despite its reduced resolution compared to other monitoring technologies.This way, MLS surveys that were carried out for different purposes, such as object semantic segmentation or infrastructure modelling, can also be useful to evaluate road condition and identify severe distresses.
Future lines of work may focus on using other types of classification tools, such as deep learning networks for point cloud processing, that could improve the results achieved by our models.The evaluation of MLS systems with greater point resolution is also encouraged, as it entails the main limitation for the detection of the more subtle types of distresses on the road.

Figure 1 .
Figure 1.a) Manual measurement of marked cracks b) MLS equipment.

Figure 2 .
Figure 2. Adjusted point intensity in a section of the pavement.

Figure 3 .
Figure 3.Comparison between ground truth crack points (left)and SVM predicted points after denoising (right).

Figure 4 .
Figure 4. Comparison between ground truth cracks (test point set) and SVM (left) or Random Forest (right) predictions using Convex Hulls.

Table 3 .
Intersection over union between cracks detected by the models and ground truth.