LINE-BASED CLASSIFICATION OF TERESTRIAL LASER SCANNING DATA USING CONDITIONAL RANDOM FIELD

This paper describes a line-based classification method, which labels TLS point clouds into vertical object, ground, tree and low objects. A local classifier implements labeling task on individual site independently of its neighborhood, the inference of which often suffers from similar local appearance across different object classes. In this paper, we describe an approach using contextual information as postclassification improvement to a local generative classifier. The contextual information is expected to compensate for ambiguity in objects’ visual appearance. A generative classifier is produced using Gaussian Mixture Model (GMM), model parameters of which are iteratively optimized with Expectation-Maximization (EM). The model we use to incorporate contextual information is the Conditional Random Field (CRF), which improves the classification results obtained from GMM-EM classifier by incorporating neighborhood interactions among labeled objects as well as local appearance. The proposed method was validated with three TLS datasets acquired from RIEGL LMS-Z390i scanner using cross validation.


INTRODUCTION
Recently, 3D photo-realistic modeling of urban space has been attracting much attention from photogrammetric and computer vision communities as there is an increasing demand for many applications, like urban planning, augmented reality and personal navigation.The virtual urban space requires 3D geometric representation of not only rooftop from top-view (LOD1 and LOD2), but also street-level scenes (LOD3).Due to close range, high point density and accuracy and cost-effectiveness, Terrestrial Laser Scanning (TLS) is relatively new surveying tool, and has been rapidly adopted for modeling of urban street scenes.The urban street environment is composed of various street objects as well as moving objects with large degree of occlusions and shadows.Classifying such complex urban street scenes in an automated manner still remains as a challenging vision task.
Supervised classification is a machine learning method, which learns mathematical models from training data.In supervised learning process, a set of features representing unique properties of target classes play key roles to successfully model a classifier, which is less sensitive to scene variations.A typical feature usually used is appearance-based property, such as colors, shapes, geometry and textures, which makes an object of interest distinguishable from the others.These features are analyzed within a homogeneous local space (e.g., per point, line, plane or other types of primitives) as object scale is usually not known in advance.Amongst those primitives, line is easily to be extracted and widely used for object understanding (line drawing analysis) from image sequences or point clouds.Moreover, TLS illuminates and records laser shots along a scan line.Thus, line primitivebased scene analysis is well applicable to "per-scan line" classification, which might be suit for real-time monitoring application.
In this paper we present a line segment-based classification of point clouds acquired from TLS using Conditional Random Field (CRF).The proposed classifier aims to identify four object classes of vertical, ground, tree and low objects from TLS data.Working at the line level, these objects are represented with linear features attributed with length, orientation, height and range.In our approach, line segments are first extracted in single scan lines (laser profile).To present the role of contextual information in classification, we used a generative classifier as a baseline classifier to obtain an initial prediction, which was improved by adding contextual information.In this generative classifier, we used the Gaussian Mixture Model (GMM) to model class conditional probability and its parameters were learned using Expectation Maximization (EM) algorithm from training data.However, the local classifier always suffers classification from similar local appearance.In order to rectify this error, while still maintaining the benefits of local generative classifier, semantic context was introduced as an additional constraint that enforces local label agreement.By incorporating local appearance as well as contextual information, CRF model is able to improve the previous classification results.We used the Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) (Liu &Nocedal, 1989) method to optimize the weights in CRF model.For the model configuration inference, we used the Loopy Belief Propagation (LBP) algorithm, which has been shown as a standard technique for approximate inference of graphs with cycles (Murphy et al., 1999).
The paper outlines following sections: Section 2 discussesrelevant previous works related to current research.In Section 3 we describe details of line segmentation per scan line and feature extraction.Then, we present our methodology of CRF in Section 4and discuss our experimental results in Section 4 and 5. Finally, we draw our conclusions and give an outlook of future works in Section 6.

RELATED WORKS
Most existing methods for TLS data classification have focused on geometric features extracted from laser point clouds.According to scales to extract geometric features, the methods can be categorized into two types: point-based and surface-based.Point based method operates classification directly on individual laser point using feature vector extracted on its local neighbors.(Triebel, et al, 2006;Munoz et al, 2008).While, surface-based classification algorithm firstly segments the laser scanning data into homogeneous surfaces and then implement classification by labeling these surfaces (Vosselman, et al., 2004;Belton & Lichti, 2007;Pu & Vosselman, 2009).Both point-based and surfacebased classification methods are typically implemented in 3D volumetric space.This might require computationally expensive process for constructing relational network or segmenting surfaces over large amount of points.However, it is a straightforward to consider segmenting line primitives and construct relational graph per scan line, instead of laser points or segmented surfaces, so as to improve computation efficiency (Jiang and Bunke, 1994).
Sithole & Vosselman (2003) partitioned the airborne laser scanning data into two sets of families of orthogonal profiles running along x and y direction and then linked points if they conform to some rule, like height and slope.Unlike the methods mentioned above, we want to classify line segments rather than individual point or 3D supervoxels.Thus, we created a Line adjacent graph (LAG) to represent the relationship among line segments.The LAG graph is considered within each scan line, which means the graph is only constructed over those line segments locate at the same scan line.The relationship across scan lines was not considered here.For the line segmentation based classification, linear features were extracted.

Scan Line Generation
Prior to line segment extraction, the whole scanning data was split into scan lines.The scanning TLS data is assumed to be sequentially observed in a discrete-time fashion, which is denoted by : , , , …… : , , , …… …… : , , , …… Each scan line is considered as a stream of observed points.The width of each scan line is set as the scanning angle precision, here 0.05 degree.Finally, the TLS data is split into a set of vertically continuous scan lines with an interval of 0.05 degree at azimuth angle (refers to the horizontal alignment).Figure1 shows an example of scan line.

Line segment generation
In urban environments, structured objects, like planar (building facade, ground) and cylinder (lamp post) objects, typically have continuous and smooth appearance.Therefore, neighboring points reflected from structured objects in the same scan line have similar range values.On the contrary, points from unstructured object, like tree, have large range differentiate value.Here, points has large range differentiate value with its neighbors is called "scattered points" and the points with small range differentiate value is called "smooth points" (Manandhar & Shibasaki, 2001).If assign enough small range differentiate threshold, "continuous smooth points" (i.e., smooth points are closely placed) could be clustered as one line segment.We used the range analysis referring to Manandhar&Shibasaki (2001) to extract line segment.Figure 2(a) shows the scattered points and smooth points and figure 2(b) shows the line segments produced by continues smooth points.It is observed that points from the two objects could be fallen into one line segments.For example, building points and ground points could be captured by one line segments.In order to fix this problem, the Douglas-Peucker algorithm is used to implement the line segment subdivision.

3.3Feature extraction
The features vector was extracted on the characteristics of line segment not individual point.In this work, two types of features were extracted, geometric feature (length and orientation) and location feature (height and range).Before feature extraction, the points locate at the same line segment were fitted using the least square line fitting.
Due to the planar characteristics, line segments extracted from vertical object, like building and ground, are very long.These objects are mainly man-made and so their orientations are typically vertical or almost horizontal.However, due to sparse distribution and irregular shape, tree and low object are usually very short and do not have formal orientation.Length shows the line segment's longest extension in 3D space.After straight line fitting, the normal vector of estimated lines was obtained.The distance between two endpoints along the normal vector is regarded as the length of the line segment.The orientation of a line segment is defined as an inner angle made between the line segment's normal vector and Z axis.
We observed that spatial arrangement of urban objects often shows typical patterns (rules), like building, tree and other objects should be higher than ground.Objects belong to the same label have similar distribution on height and have similar distance from a scanner's center.The height of a line segment is defined by as the maximal Z value of the line's member points.The range was calculated as the distance between the centroid of a line segment and the scanner center.

CONDITIONAL RANDOM FIELD (CRF)
Suppose we are given a set of N line segments x 1 , x 2 , …x N extracted from one scan line.The classification task is essentially to find a label y i ∈ {1, . . ., K} for each line segment x i .In this research, we are interested in four kinds of instances, vertical object, ground, tree, low object, let Y={V, G, T, L}.We model the line segment classification in a probabilistic framework, which chooses the class label by maximizing the probability of class labels Y given the observed data X, P (Y|X).CRF model is a natural way to incorporate neighborhood interactions among the labels as well as the observed data.Let the feature vector extracted from line segments as the observed variable X and the corresponding unknown class labels as hidden variable Y, the conditional random field model the classification task as estimating the posterior probability P (Y|X) directly.In this research, the graph is considered within each scan line, which means the graph is only constructed over the line segments locate at the same scan line.The relationship across scan line was not considered here.
Let G = (V, E) be the graph over line segments.Each line segment is regards as one node in the graph.If the nearest distance between two line segments is smaller than certain threshold (here it was set as 1 meter), one edge is created to connect them.It is noted that, different from the graph model of image, this graph does not follow a regular grid pattern.The CRF model is globally conditioned on the observation X.Given the fundamental theorem of random fields, the conditional distribution over the labels Y given observed data X has a general form in Equation (1): Where i indicates the site, S is the total site set and N i is the neighborhood of node i. X is the all of the observed feature vectors.Each feature vector is consists of a combination of feature descriptors extracted from line segment.Y is class labels associated with the observed feature vector X.P(Y|X) is the posterior probability to be estimated.Z(X) is the partition function.A i (X,y i ) and I ij (y i ,y j ) respectively stand for association penitential and interaction potential, the detail of which will be introduced in following parts.

Association potential
The association potential measures how likely the class label y i is assigned to the single node i given the global observations X and ignoring other nodes.It is related with a conditional probability P'(y i | X) of class y i given the data X in Equation (2): Theoretically, the posterior probability of any local classifier can be used.In this experiment, the posterior probability is obtained using a local generative classifier in Equation ( 3) and (4): )is merely a scaling factor to assure that posterior probabilities are summed up to one.Therefore, the major problem in the Bayesian classifier is how to estimate the likelihood P (x i | y i ) and prior probability P (y i ).The prior probability is simply assigned with equal values here.Due to the complexity of urban objects, the actual probability density function is a multimodal.Therefore, the mixture Gaussian approximation is a quite reasonable method to model likelihood, which is expressed as follows: (2) where is the Gaussian mixture component, α m is corresponding the weight, and M indicates the number of mixture components.The value of α k ranges from 0 to 1 for all components, and the sum of α m equals 1.The parameters } ,..., , ,..., , ,..., {  , 1998) gives practical details on the EM algorithm for classification using GMM.In this research, we use a uniform component value, three, for all class-conditional probabilities.

Interaction potential
The interaction potential can be seen as a measure of how the labels at neighboring sites should interact given the observed data (Kumar and Hebert, 2006).In Equation ( 6), the interaction potential I ij provides a possibility to model the interaction of contextual relations of neighboring nodes. ) For each edge connecting two nodes i and j, an edge feature vector μ ij depending on the observed data is generated.The generalized linear model (GLM) is usually utilized to model the interaction potential over edge feature vector μ ij in Equation ( 7).
When two neighboring nodes have different class label, the interaction potential is expected to be penalized, whereas corresponding labels are preferred.The degree of penalization depends on the edge feature vector μ ij and weight vector v, which is learned over training sample.Here, we use two methods to generate edge features, subtracting and concatenating two single nodes' features.
We use the difference between the location features (range and height) of two adjacent line segments because we assume that objects have closer spatial distance are more likely to have the same label.It is also noticed that due to occlusion and surface complexity, objects cloud be over-segmented into several short line segment, such as facade, ground, which is likely to be misclassified as tree or low objects.But when concatenate geometric features (length and orientation) of two adjacent nodes, edge connecting short building and ground line segments still have large value on geometric features but tree and low object do not have this kind of combination effect.In current study, the final mathematics CRF model is rewritten as is the posterior probability obtained from the GMM-EM classifier.

Parameter learning and Inference
There are two groups of parameters in Equation ( 8), parameters for association potential and interaction potential.The parameters involved in CRF could be learned at the same time by maximizing their conditional log-likelihood.However, in this research, we use a generative classifier as the input of association and it makes the parameter learning simultaneously intractable.Therefore, we decomposed the parameter learning into two stages.At first stage, parameters of Gaussian mixture model were learned over the labeled training data.Once it was done, the log posterior probabilities were used as the association potential.At the second stage, the weights of edge feature vector was learned using the limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method, which is a member of the broad family of quasi-Newtonoptimization methods that uses a limited memory to approximate the inverse Hessian matrix, the detail of which can be check in Liu & Nocedal (1989).In our formulation, we simply have one weight w which represents the tradeoff between spatial regularization and our confidence in the classification.
The final task of classification on CRF is inference, which can be seen as finding the best configuration with respect to some cost function.The graph we constructed is cyclic and the exact inference over this structure is an intractable problem (Kollar& Friedman, 2009).Loopy belief propagation (LBP) is an exact approximation solution for graphs with cycles (Murphy et al., 1999).Computing the approximate gradient using LBP, and learning CRF model parameters using stochastic gradient-based optimization method, has been approved to work well in Vishwanathan et al. (2006).Following the work of Vishwanathan et al. (2006), we use L-BFGS for parameter learning and LBP for configuration inference.

5.1Dataset
The proposed method was validated with static TLS data, which was collected by RIEGL LMS-Z390i at three different sites, Seneca building (York University), Passy residence (York University) and one building in Distillery district (Toronto downtown).All of the three datasets contain our interest objects, vertical objects, ground, tree and low object.Moreover, they have similar distribution on appearance, location and arrangement.To comprehensively evaluate the role of contextual information in classification, we mainly present the results of two kinds of methods: 1) GMM-EM, it can be regarded as the CRF with only association potential and 2) CRF with both association and interaction potential.
The K-fold cross validation was used here.There are 5323 scan lines in the all of the three datasets.Firstly, the 5323 scan lines were randomly divided into 5 equal size subsets.Each time, parameters of local generative classifier model and CRF model are learned on 4 subsets and then tested on the retained test subset, which was repeated five times.Classification performance was individually measured and then averaged.It is noted that the cross-validation was not used for parameter learning but only assessing how the trained model will generalize to an independent test dataset.The classification accuracy is estimated at the line segment level.At first, all points were manually labeled for ground truth.The ground truth of line segment was then assigned to be the majority vote of its member points' labels. (8)

5.2Qualitativeevaluation
We take both qualitative and quantitative evaluation of the proposed CRF model.The overall classification result of the three datasets is presented in Figure3.This scan line is taken from Seneca building, York University.In figure 4, it is clear to observe classification errors that suffer from similar local appearance using local classifier, which is a typical drawback of local likelihood model.The local generative classifier model cannot make a spatially coherent prediction.For example, building is found in tree; tree is found in building; and low object is right below building.However, CRF solution rectified this kind of misclassification by considering the neighborhood interaction of the data.

5.3Quantitative evaluation
As regard the quantitative evaluation, confusion matrices were created for the two classifiers.The omission error, commission  It is noted that the contextual information almost does not affect the commission error of building and ground but has great influence on tree and low object, respectively decreased from 8.91% to 2.81% and from 42.81% to 20.20%.Contextual information makes the omission errors of building dramatically drop from 22.94% to 6.72%.The limitation of this contextual information is that it makes more low object misclassified as other classes, increased by 7.14.Moreover, it also has little influence on the omission errors of ground and tree.This contribution has been peer-reviewed.doi:10.5194/isprsarchives-XL-7-W2-155-2013contextual information mainly reduces the commission error between vertical object and tree as well as vertical object and low object.Statistics shows that there are 9848 line segment transited their labels, accounting for 8.26% of the total.In the total 9848 transition, 8237 (83.64%) transition is positive, which means additional contextual constraint makes the prediction from false to true.We also observe that many correctly classified tree were changed to low object and some correctly classified low object were changed to building.The latter explains relatively large increases in the omission errors of low objects.

CONCLUSION
This work approaches the problem of semantic classification in TLS LiDAR data.Here, we proposed the classic discriminative contextual classifier, CRF to classify TLS data.The CRF model introduces neighborhood interactions among the labels as well as the observed point cloud.By maximizing object label agreement according to the contextual coherence, CRF model compensates for ambiguity in objects' local appearance.Performance of baseline classifier and this discriminative context classifier are evaluated.The experiment results show the improvements in classification accuracy are obtained by considering object label agreement, which validates the advantages of the discriminative context classifier model.As semantic classification of TLS data is still a hot topic, there are many work need to be done.In the future we hope to introduce new associate and interaction features to improve further the classification accuracy, like intensity and color.We also interested in exploring new ways to construct interaction potential, such as spatial arrangement of objects.In addition, we hope to find new parameter learning algorithm to ensure the parameters do not only fit the training data but also generalize to unseen test data.
Zhao et al. (2010) used a line segment based classification for the TLS data collected from single-row laser scanner.In their work, planar objects, like building, road are extracted as straight line segment and free form objects, such as tree were extracted as small line segment or irregular points.Hu & Ye (2013) used Douglas-Peucker algorithm to segment the ALS scan line into line segment and classified them into buildings and vegetation based on local analysis using simple rules.With regards to the context-based classification, CRF is a natural way to model contextual relations amongst relational features (objects).It was originally proposed by Lafferty et al. (2001) to label sequential data.CRFs belong to the family of graphical models and represent data as a graph structure consisting of nodes and edges.Recently, many works on classifying laser scanning point using CRFs have been published.Lim & Suter (2009) presented a method to classify 3D outdoor terrestrial laser scanned data using multi-scale CRF model.The graph was constructed over a specifically designed 3D super voxels.Rusu et al.(2009) labeled an indoor point clouds using a point-wise CRFs according to the geometric surface they belong to, such as cylinders or planes.Shapovalov, et al.(2010) classified point cloud obtained from airborne laser scanning data using CRF.They firstly performed segmentation on the point cloud and then classified the segments.
(a) Points in the orange mask are captured by scan line 2 (b) Example of scan line Figure 1.Scan line generation.
. To estimate the parameters of the Gaussians Mixture model, the classic Expectation Maximization (EM) algorithm is used on the training label data.(Bilmes, J. A.

Figure 3 .
Figure 3.Visulization of classification by CRF.(a) Seneca building, York University, (b) Passy residence,York University, (c) Building at Distillery district, Toronto.Red: Vertical objects; Blue: Ground; Green: Tree; Purple: Low object.To show the qualitative perspective of this CRF model, we choose a representative scan line and compare the classification result of the two methods, which is shown in figure 4.

Figure 4 .
Figure 4. Classification result of the representative scan line obtained from GMM-EM (left) and CRF (right); Red: Vertical objects; Blue: Ground; Green: Tree; Purple: Low object.
error and overall classification accuracy were compared.The overall classification accuracy of two methods on each folder is showed in Table1.The advantage of the contextual information is clear, the overall classification accuracy increased by nearly 6%.The omission error, commission error of each class describe the results in more details, are shown in Figure5.Tabel 1. Cross-validation results obtained from GMM-EM and CRF classification.

Figure 5 .
Figure 5.Per-class omission and commission errors caused by GMM-EM (left) and CRF (right).

Figure 6 .
Figure 6.Label transition Figure 6 shows label (state) transition from GMM-EM to CRF prediction in detail, indicating how the contextual information improves classification performance.It is clearly observed that International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, VolumeXL-7/W2, 2013  ISPRS2013-SSG, 11 -17 November 2013, Antalya, Turkey