DETECTING LINEAR FEATURES BY SPATIAL POINT PROCESSES

This paper proposes a novel approach for linear feature detection. The contribution is twofold: a novel model for spatial point processes and a new method for linear feature detection. It describes a linear feature as a string of points, represents all features in an image as a configuration of a spatial point process, and formulates feature detection as finding the optimal configuration of a spatial point process. Further, a prior term is proposed to favor straight linear configurations, and a data term is constructed to superpose the points on linear features. The proposed approach extracts straight linear features in a global framework. The paper reports ongoing work. As demonstrated in preliminary experiments, globally optimal linear features can be detected.


INTRODUCTION
Image features play a fundamental role in many tasks of image analysis.For example, both surface reconstruction and object extraction usually rely on some distinguishable features.Feature detection has been investigated widely in the communities of photogrammetry and remote sensing, computer vision and pattern recognition.However, few works were dedicated to modeling feature shape in a global framework.This paper address this issue in the context of detecting linear features.Linear features offer salient cues for region boundaries and object contours.

Related Work
Image features refer to image primitives such as points, lines, curves and regions.Corner detectors and interest point detectors have been developed to detect point features (Förstner and Gülch, 1987, Harris and Stephens, 1988, Tomasi and Kanade, 1991, Shi and Tomasi, 1994, Smith and Brady, 1997, Lowe, 2004, Rosten and Drummond, 2006, Bay et al., 2008, Rublee et al., 2011).Region detectors have been proposed to detect region features (Kadir et al., 2004, Tuytelaars and Van Gool, 2004, Leutenegger et al., 2011, Alahi et al., 2012).Both straight lines and curves are linear features.Their detection is formulated as edge detection, line detection, boundary detection or contour detection.
Edge detection aims at detecting edges, which refer to abrupt changes of brightness.Basically, sharp changes are computed by differential operations, which is achieved by differentiation based filters such as Sobel, Prewitt, Roberts, Laplacian of Gaussian, etc.These filters are simple and sensitive to noise.This disadvantage is significantly overcome by the Canny edge detector (Canny, 1986).Its success depends on the definition of three comprehensive criteria, namely good detection, good localization and single-pixel response, for the computation of edges.Edge detection is formulated as an optimization with respect to these criteria.Benefiting from its solid mathematical foundation, this detector has proven to be reliable and has gained popularity since it was first published.Besides, there are some extended and adapted versions (Deriche, 1987, McIlhagga, 2011, Xu et al., 2014).However, feature shape is not modeled in the computational framework.
Lines are pairs of anti-parallel edges.Depending on the width of the linear feature to be detected and the contrast with respect to the neighborhood, line detection can be advantageous to edge detection.The most prominent example of line detection is the Steger operator (Steger, 1998).
Boundary detection refers to detecting boundaries between different regions.Boundaries are indicated by edges, excluding those inside a region.However, there are typically a lot of edges inside a textured region.This issue is addressed by the Pb (probabilityof-boundary) edge detection algorithm (Martin et al., 2004).For each pixel it fuses brightness gradient, texture gradient and color gradient to calculate the probability of being a boundary pixel.Further, MS-Pb (Multiple Scale Pb) integrates Pb in multiple scales (Ren, 2008), and gPb (global Pb) introduces global information into the MS-Pb (Arbelaez et al., 2011).The gPb detector can detect boundaries between texture regions.However, none of them models boundary shape.
Contour detection is dedicated to extracting outlines of objects of interest.Some approaches group edges, lines or boundary fragments into contours (Zhu et al., 2007, Arbelaez et al., 2011, Ming et al., 2013, Payet and Todorovic, 2013).The other approaches fit active contour models to images (Kass et al., 1988, Mishra et al., 2011, Dubrovina et al., 2015).Although shape is modeled and utilized by these approaches, modeling is performed not in a global framework.Spatial point processes are stochastic models for spatial entities.The relationships among entities can be considered by a prior model, and, the shapes of the entities can be represented by some geometric marks.For example, linear features and line-networks are represented by line segments and polylines (Stoica et al., 2004, Lacoste et al., 2005, Lacoste et al., 2010, Chai et al., 2013, Schmidt et al., 2015), building outlines are represented by rectangles (Ortner et al., 2007, Ortner et al., 2008), and tree crowns are represented by circles and ellipses (Perrin et al., 2005).But these marks are too specific to represent general linear shapes.Although a combination of above marks integrates their capacities (Lafarge et al., 2010), they are not general enough to represent freeform linear features.Alternatively, stochastic models can be employed to segment an image into regions, whose boundaries are freeform linear features (Tu and Zhu, 2002).However, regions instead of boundaries are modeled explicitly.

Motivation and Contribution
Most feature detectors are not developed in a global framework.Also, they do not deal with feature shape and uncertainty.While marked point processes do provide a global model for feature shape and uncertainty, in existing approaches linear features are represented by geometric marks.Apart from losing generality, the space of solutions is expanded significantly since additional parameters are involved in decribing the geometric marks.These additional parameters impair both detection quality and efficiency.
With reference to Fig. 1 a point is an atomic element in mathematics, and a string of points forms a (possibly straight) curve.In turn, a set of connected curves encloses a region.Motivated by this common sense observation, in this paper a linear feature is represented by a string of points.Further, the paper proposes a spatial point process, in which points are inclined to form curves as shown in Fig. 1(c).The paper makes several contributions: • Spatial point process: Based on the proposed model for a spatial point process, points can form linear shapes.This characteristic assures its potential in modeling and analyzing linear shapes.
• Feature representation: The proposed model is general enough to represent freeform linear features.Edges, lines, boundaries and contours can all be represented in this model.
• Global feature detection: Linear feature detection is formulated as a global optimization problem.The globally optimal solution can be reached very efficiently since no extra parameters are involved in the computation.
The rest of the paper is organized as follows.First, some background information on spatial point processes is introduced in Sec. 2 Second, the novel approach for linear feature extraction is proposed in Sec. 3 Then, experimental results are presented in Sec. 4 Finally, conclusions are drawn in Sec. 5

Spatial Point Processes
A spatial point process is a useful model for a random pattern of points as depicted in Fig. 1(a).For a k-dimensional space R k , let F ⊂ R k be a compact set, let ω = {ω1, . . ., ωn} be a configuration having n unordered points ωi ∈ F , let Ωn be the set of configurations that consist of n points.A point process on F is a mapping Ψ from a probability space to the set of configurations Ω = ∞ n=1 Ωn, such that, for all bounded Borel sets S ⊂ F , the number of points NΨ(S) falling in S is a finite random variable.
A Poisson process in the plane with uniform intensity measure ν(.) is a point process in F ∈ R 2 , such that, for every bounded closed set S ∈ F , the number of points falling in this region follows a Poisson distribution with mean ν(S), and the number of points are independent for disjoint regions.
Complex point processes are specified by a probability density h(.) defined on Ω and a reference measure µ(.) under the condition that the normalization constant of h(.) is finite: (1) The measure µ(.) having the density h(.) is usually defined via the intensity measure ν(.) of a homogeneous Poisson process.For a more detailed definition of point processes the reader is referred to (Møller and Waagepetersen, 2004).

Feature Extraction Based on Marked Point Processes
To extract features from images, a data term measuring the consistency between the points and the image is introduced into the density h(.), and a prior term reflecting the spatial interactions among the points is also introduced into the density.This density can be expressed as a product of a data term h d (.) and a prior term hp(.): Feature extraction is achieved by searching for the configuration ω maximizing the probability density as Existing approaches typically employ geometric marks such as line segments to represent shapes.As depicted in Fig. 1(b), the geometric marks are very specific, they cannot represent general shapes.Furthermore, the configuration space Ω is expanded dramatically by the extra parameters for the marks.Both, detection quality and efficiency are impaired within this complex space.

Linear Feature Representation
This paper represents a linear feature by a string of points instead of a geometric mark.As shown in Fig. 1(c), a line segment is represented by a set of points sampled on the segment.Any other linear features such as freeform curves are also represented by their sample points.
The suggested model contains a data term measuring the consistency between the points and the image data.Further, this paper proposes a prior model for spatial point processes such that linear configurations become more probable than non-linear configurations and straight ones are prefered to curved ones.

A Prior Model for Linear Configurations
A prior probability of a configuration ω is measured by three different criteria: the sparsity of its points, the neighborhood of its points and the curvature of the linear configuration.The density hp is a product of three densities: where hs(ω), h nb (ω) and hc(ω) are three densities based on the criteria of sparsity, neighborhood and curvature, respectively.
Figure 2: Sparsity is assured by penalizing closely neighboring points indicated by the overlapping disks.
Sparsity: Configurations of sparse points are assured by penalizing small distances between neighboring points.Let a disk of radius r1 be the influence area of each point as illustrated in Fig. 2. When two points are closer than 2r1, their disks overlap.To avoid overlapping, the distance between any two points must be larger than 2r1.
The density for sparsity is defined as where α is a parameter of this model, s(ω, r1) is calculated as follows: where d(ωi, ωj) is the distance between ωi and ωj.
The above model penalizes small distances between neighboring points.For 0 < α < 1, it is a soft constraint, points are allowed to be closer than 2r1, but they are less probable.Neighborhood: As one tracks points along a line, each point has a preceding point and a succeeding point, excluding the first point and the last point.As depicted in Fig. 3, point ωj has a preceding point ωi and a succeeding point ω k .This is a basic characteristic of linear features; it is introduced into the neighborhood density as follows where β is a parameter of this model, and nb(ω, r2) is calculated as where deg(ωj, r2) is the degree of ωj, i.e. the number of points within a disk of radius r2 around ωj.Note that typically r2 is assumed to be larger than r1.
For 0 < β < 1, points with two neighbors within a disk of radius r2 are preferred and the other points are penalized.

Curvature:
The local curvature at a point of a linear feature is described by the angle δ defined by its preceding and succeeding point as illustrated in Fig. 4. Note that the curvature term is only applied for points with exactly two neighbors, as otherwise its computation in the described way is either not possible (one neighbor) or ambiguous (three or more neighbors).
In our approach linear features are assumed to have small curvatures; this is why we say we prefer straight linear features.In turn, small curvature is indicated by an angle close to π.This angle is introduced into the curvature density as follows where γ is a parameter of the model, and c(ω) is calculated by where ωi and ω k are two neighbors of ωj.

Data Consistency of Linear Features
Assuming the data for the points to be independent, the data density can be written as a product of local terms where the local term h d (ωi) for a point ωi measures its consistency with a linear feature, i.e. the likelihood of being either a corner or an edge.The local term is computed by where λ1 and λ1 are two eigenvalues of the Harris matrix, which is calculated as the covariance matrix M of the derivatives of the image I with respect to the pixel neighborhood where Ix = ∂I ∂x and Iy = ∂I ∂y are two derivatives, and N (ωi) is a window centered at ωi.
As pointed out in (Harris and Stephens, 1988), λ1+λ2 = T r(M ) measuring the flatness at the point.Large values indicate either a corner or an edge.Therefore, this data term can be used to find points at object boundaries.

Linear Feature Extraction
Linear feature extraction is formulated as finding the optimal configuration of a spatial point process.The optimal configuration ω maximizes the probability density This is not a conventional optimization problem since the configuration space has variable dimensions and the probability density is multi-modal.Simulated annealing is employed to successively simulate a series of probability distributions which converges to a distribution concentrated on the optimal configuration (Kirkpatrick et al., 1983).The Reversible Jump Markov Chain Monte Carlo (RJMCMC) sampler is embedded into the optimization procedure to take into account the variable dimension of the configuration space and to simulate the expected probability distribution (Green, 1995).
3.4.1 Searching the Optimal Configuration: Consider a spatial point process, in which each configuration ω has the probability density where T > 0 is a temperature parameter.When T → ∞, hT (ω) defines a uniform distribution on Ω; for T = 1, hT (ω) = h(ω); and as T → 0, hT (ω) concentrates on the optimal configuration ω .When the temperature starts from a large value and decreases according to a logarithmic function, it is guaranteed that the globally optimal configuration is found as the temperature approaches zero.In practice, a faster geometric decrease gives an approximate solution, in general close to the optimum (Baddeley and Lieshout, 1993).
The RJMCMC sampler is adopted to simulate a discrete Markov Chain (Xt) t∈N , which converges to hT (ω).At each transition, one point of the current configuration ω is perturbed to generate a new configuration ω according to a kernel Q(ω → .).The configuration ω is then accepted as the new state of the chain with a probability min(1, R), in which the Green ratio is calculated as The kernel Q can be a mixture of some sub-kernels Qm where pm is the probability of choosing the sub-kernel Qm.The kernel mixture must allow any configuration in Ω to be reached from any other configuration in a finite number of perturbations (irreducibility condition of the Markov chain), and each sub-kernel has to be reversible, i.e. able to propose the inverse perturbation.

Transition Kernels:
The birth and death kernel QBD and the translation kernel QT are developed in the sampler.These two kernels guarantee the irreducibility condition of the Markov chain.
Birth and death kernel: The uniform birth and death kernel inserts a point into the current configuration ω and deletes a point from the current configuration ω, respectively.Adding and removing a point corresponds to jumps into a higher dimensional and a lower dimensional subspace, respectively.The green ratio for a birth kernel is given by where λF is the Poisson parameter representing the expected number of points in the domain F (the whole image), n(ω ) is the number of points in the proposed configuration ω , ω i is the inserted point, and p d (resp.p b ) is the probability of choosing a death (resp.a birth) kernel.In our experiments we set p d = p b = 0.5.In case of a death, the proposition kernel ratio corresponds to the inverse of the birth's ratio.
Translation kernel: The translation kernel moves a point ωi to a new point ω i .The green ratio for a translation kernel is given by In our experiments, a point is moved in a square centered around its old position and the length of each side is set to be r1.

Sampling Driven by Gradients:
The above scheme samples the configuration space uniformly.It spends a lot of time checking impossible configurations.The points are usually expected to lie on edges with large gradients.Motivated by the Data Driven Markov Chain Monte Carlo (DDMCMC) approach (Tu and Zhu, 2002), a scheme utilizing gradient information is developed to improve the sampling.
First, the point must be sampled at the pixel locations, i.e. the sample points stem from the set of pixels: where P is the set of pixels.
Then, each pixel has a weight proportional to its gradient magnitude where Ix = ∂I ∂x and Iy = ∂I ∂y are the two derivatives.The weights of all pixels are normalized such that its sum equals one.
Finally, uniform birth and death is replaced with gradient driven birth and death.In uniform birth and death, each pixel has the same chance of being selected as a new point, while in gradient driven birth and death the pixels having larger gradient have more chance of being selected as a new point.The chance is proportional to its gradient.

EXPERIMENTAL RESULTS
The described method was implemented and some preliminary tests were conducted to obtain a first insight into the behavior of the suggested approach.The experiments consist of model simulation and feature detection, the results are reported in the following.The parameters selected for the experiments are presented in Tab. 1.
pixels) r2 (pixels) 200 0.9 0.9 0.9 5 10  Fig. 6 presents the results of feature detection.The left column presents the original images.The two middle columns depict features detected by the Harris detector (Harris and Stephens, 1988) and the Canny detector (Canny, 1986).The right column depicts features detected by the proposed approach.The detected features are represented by the shown points.The detected points representing the linear features are well placed on the region boundaries and they clearly illustrate the outline of the objects.Moreover, they are general enough to represent any freeform features and objects.
The data term is calculated once for each pixel and is stored as an image for use in the sampling procedure.In this way calculation efforts are minimized and a better efficiency is achieved.

CONCLUSION
This paper proposes a novel approach for linear feature detection.Feature detection is formulated as finding an optimal configuration of a spatial point process.Based on this formulation, a prior model is proposed to favor straight linear configurations, and a data model is constructed to draw the points towards linear features.The proposed approach detects linear features in a globally optimal framework.As demonstrated by the experiments, features can in principle be detected and they sit on the boundaries with good accuracy.
One limitation of the approach is that linear features are not detected explicitly.The model can only serve as an intermediate representation between edges and contours, which are lowlevel and high-level features, respectively.Converting the detected points into contours is worthy to investigate in the future.Further, the model does not include any topological constraints.Therefore, a network containing junctions cannot be properly extracted.Representing junctions without marks is also an issue which needs further attention.Finally, the data term can be improved by introducing a more sophisticate response, e.g. by introducing the concep of probability-of-boundary (Martin et al., 2004).

Figure 1 :
Figure 1: Overview of three kinds of spatial point processes.

Figure 3 :
Figure3: Each point on a line excluding the first and last point has two neighbors within a disk, i.e., a preceding point and a succeeding point.

Figure 4 :
Figure 4: The angle defined by three successive points.

Fig. 5
Fig. 5 presents two simulations based on two models.An optimal configuration based on the homogenous Poisson point process is presented in the left figure.The points are totally random and no linear features appear.In contrast, the right figure presents an optimal configuration based on the proposed model.The points are inclined to form lines and curves as expected.

Figure 5 :
Figure 5: Two simulation results: on the left an optimal configuration of a homogenous Poisson point process is shown, in which points are distributed randomly.To the right a configuration with density defined by the proposed model is illustrated, in which points are distributed along linear curves.

Figure 6 :
Figure 6: Experimental results: The images and features detected by the Harris, the Canny and the new spatial point process based detector are indicated by the captions.The corners and edges are depicted as red and green pixels, respectively.The points on the linear features are also depicted as red crosses.

Table 1 :
Parameters for experiments