AUTOMATED DETECTION AND VECTORIZATION OF ROAD ELEMENTS IN HIGH RESOLUTION ORTHOGRAPHIC IMAGES

: This paper proposes, describes, and applies an algorithm for the automatic detection of selected elements of road infrastructure, along with the option to determine their spatial information. The principle is based on the evaluation of the color spectrum of the selected object on orthographic images. As a source image used for the processing, output from low-altitude aerial photogrammetry or terrestrial laser scanning can be used, together with the option to implement digital elevation models into the processing. The approach is based on the detection of the color composition of the selected element of the road, followed by clustering of the identified elements within the image and mathematical transformation of the clusters into a spatial vector form. Prior to the processing, the target objects are filtered out based on user input, for which vectorization is performed. The outputs are in the form of contours or the determined basic structure of the object. The main difference compared to existing methods is that the vectorization is only performed on the selected, pre-filtered parts of the raster image with identified target objects, not the whole image. This approach makes it possible to effectively and automatically identify and analyze, e.g., the edge of the road, road markings, or road features. This enables the subsequent implementation of the identified outputs into more complex spatial models of the road or its proximity. Additionally, the processing of the data to create a digital model of the environment can be automated, with a significant saving of time and related costs.


INTRODUCTION
Implementation and utilization of photogrammetric measurements from unmanned aircraft systems (UASs) or Mobile Laser Scanning (MLS) has become nowadays a routine and highly effective tool which can be used to obtain a highly accurate, detail and actual information in various application fields. The transportation infrastructure represents a prime example and road administrators or contractors utilize the outputs in various stages of road design, construction or operational phases (e.g. Pinto et. al., 2020;Palumno et. al., 2017;Yao et. al. 2021;Sui et. al., 2021). The added value of non-selective data acquisition methods lies in the amount of acquired information, data acquisition speed and high resulting accuracy. While the complementary or processing software solutions used for data processing enable a seemingly easy transition of the data to commonly used CAD software, in practice, the actual extraction of desired road element characteristics often presents an issue. Most of the currently developed tools are aimed for processing of the resulting point clouds, which introduces demands on the computing power of the hardware and the necessity to work with very large datasets. Therefore, this contribution is aimed to present an alternative approach, based on processing of orthophotos, mainly acquired from UAS photogrammetry.
The UAS photogrammetry presents an affordable and fast data acquisition approach which can be used with great advantage for the purposes of documentation of the actual state of the road. The photographs obtained from low attitude imaging with UAS are dominantly processed with use of the digital image correlation. Correlation does not require explicit identification of individual points, but uses algorithms for automatic image recognition, identification of common significant image features, and their * Corresponding author subsequent linkage for spatial reconstruction. Commonly, the outputs are in form of is a dense point cloud composed of all the points that the computational algorithm was able to identify. Alternatively, dense elevation models of target area or orthographic images can be generated and used for further processing.
The orthophotos (  Figure 1) generated from aerial photographs can be characterized by a very high detail (Ground Sampling Distance -GSD -of 1 cm or lower) and higher image quality in comparison to orthographic views generated form terrestrial or MLS systems. This presents a significant advantage which can be utilized for the purposes of identification of road characteristics or elements, such as horizontal road markings, road edges, traffic islands.
As an example, can be given the Road Safety Audit (RSA), where the orthophotos can be utilized for the assessment of the road characteristics and serve as a base information source for the design of proposed remedial measures. Working with orthographic images created in this way has an indisputable advantage, especially in the projection phase. Graphics obtained in this way contain significantly more information about the current road equipment, road marking or close surroundings of the road. It enables to partially replace time-consuming geodetic surveys and are significantly more accurate than terrain orthographic maps available from commonly used map applications, since the output raster maps obtained from unmanned means can achieve almost any Ground Sampling Distance (GSD) and can be generated in any necessary level of detail used in the civil engineering projects. The quality of the output depends primarily on the technical equipment of the drone (camera), the approach utilized to control the solution and accuracy (Ground control points), as well as on the weather conditions and the pilot's experience. When considering all the aspects, it is possible to achieve an accuracy of approximately 1 to 2 cm per pixel (McGlone, 2004).
However, this approach is accompanied significant time demands on the subsequent extraction of information, since the raster image itself is usually used only as a background. The time-consuming nature of subsequent adjustments lies in the necessity of redrawing selected road elements into an editable, vector form, subsequently used for the design of remedial measures. This process can be lengthy, especially for more complex road section that contain large number of elements. Typical examples represent a rural intersection of major roads with traffic islands and different variants of horizontal road markings. Furthermore, longer road sections with high GSD lead to images with high resolution.
The aim of this article is to introduce a newly developed method for automatic recognition and vectorization of traffic infrastructure images without the need to use deep learning algorithms. The benefit of this research is the vectorization of only desired segments or elements in the image, not the entire image. While some form of vectorization of the image is possible with some photographic software, such as Adobe Photoshop, GIMP, this approach provides insufficient control of the outputs and limited usability in subsequent activities. The benefit of the developed software is primarily to assist in more efficient processing and extraction of information from input data into an editable form and saving unnecessary time for traffic designers and road safety auditors.

RELATED WORK
Currently, image vectorization is applied in many industries, especially in graphic or design fields. Image modifications were already being carried out in the early 80s, before vectorization as such could be talked about, but similarities can be found in the underlying principles. It involved mainly finding the basic skeleton of an image by utilizing the thinning algorithms. The image, or cluster of algebraic symbols, was simplified using mathematical methods down to the basic pattern of unit width. The principle was consisted of two sub-iterations, in which selected points were erased in each cycle, either in the northwest or southeast direction (Zhang et. al., 1984).
In the 1990s, various methods were developed and tested primarily on cadastral maps with the aim of converting drawing documentation into an editable form using vectorization. This process involved gradually smoothing lines and guiding curves through areas of dark pixels in the original map drawing. This made it possible to redraw the edited image and recognize contrasting lines (Janssen et. al., 1997).
Current studies focus specifically on vectorising hand-drawn sketches as a basic tool in the process of digitizing design. A method for vectorising raster line images is presented in a paper from Guo et. al. (2019), which deals with solving problems related to line drawing and the process of processing 2D animation. It was found that existing vectorization methods either suffer from low accuracy or are unable to handle highresolution images. The goal of this study was to deal with drawings containing differently complex intersections of individual lines. A two-phase vectorization method was proposed, which analyses global and local topology in the process of vectorising lines. In the first phase, lines are divided into sub-curves, and in the second phase, the image is reconstructed. Image reconstruction was performed using various methods, which were compared to the two-phase solution. Experimental statistics showed that the two-phase method significantly outperforms existing methods in terms of computation speed, achieves visually better topology, and improves the accuracy of image reconstruction.
Vectorization is inherently a problematic issue, as there are many possible images that could be derived from the same raster base. However, not all of these vector images would reflect the actual situation as observed by the observer. For this reason, studies focused on assessing the quality of this issue have emerged, such as Yan et. al., (2020). It has been found that there is a discrepancy between sketches created in the field and clean, sketch-like drafts required by algorithms that process them. Cleaning algorithms differ greatly in the assumptions they make about input and output drafts. Comparative tests have been conducted that evaluate and focus on research in the area of draft cleaning. The test data set in this study consisted of dozens of sketches. Comparative tests of image cleaning and draft processing quality were performed, resulting in a comparative analysis suitable as a basis for further work on image automation and vectorization. Najgebauer et. al., (2019), also dealt with the conversion of an artwork into a graphic form. The contribution focused on vectorising black and white sketches using a method for rapid vectorization of line-drawn images based on a multi-stage second derivative accelerator detector using a summation table and an auxiliary grid. The image was initially scanned along the grid lines, with nodes being added to increase accuracy. The use of inertia in tracing lines allowed for better mapping of nodes in a single pass. Vectorization worked efficiently regardless of line thickness or shading, with experiments showing that it was over two orders of magnitude faster than existing methods without sacrificing accuracy.
The utilization of convolutional neural network (CNN) for automatic image segmentation was also the focus of a study on retinal vessel segmentation (Chala et. al., 2021). A model in the form of a decoder with multiple encoders was proposed, where the architecture consisted of two encoder units with convolutional and max-pooling layers, and a decoder unit with convolutional and deconvolution layers. The algorithm directly takes RGB retinal images as input. The model is trained to create feature maps based on convolutional and deconvolutional operations and non-linear activation functions in the model. The results are compared with existing data, and the performance of the model is evaluated using various metrics, such as F1 score, accuracy, sensitivity, specificity, and precision. The results obtained from the proposed model suggest that the method has the potential for practical applications, such as computer-aided analysis of retinal images, for example, in automatic retinal screening.
In a subsequent article (Hettinga et. al., 2022), the authors focused on the use of triangulation and image color as sources for creating vector primitives for image vectorization. This fully automatic method had advantages for rendering performance, texture detail, and efficiency, where everything depended on the quality and accuracy of the mesh network. Examples of the method's application were provided for various input images, ranging from photographs, drawings, paintings, to designs and caricatures.
The Khattab et. al., (2014) presented a comparative study using different color spaces to evaluate the performance of color image segmentation using the automatic technique GrabCut. GrabCut is considered a semi-automatic image segmentation technique because it requires user interaction to initialize the segmentation process using the unsupervised technique of Orchard and Bouman clustering for the initialization phase. The effectiveness of the technique was presented in terms of segmentation, quality, and accuracy using the color spaces RGB, HSV, CMY, XYZ, and YUV. The comparative study and experimental results using different color images show that the RGB color space is the best color space for processing.
In the publication of Kim et. al., (2018), the opposite approach was presented in the form of rasterization inversion. The most probable sets of paths that could create a raster image were defined. Once the segmentation was calculated, existing vectorization approaches could be used to vectorize each path and then combine all paths into a single image. To determine which set of paths is the most probable, a pair of neural networks were created to provide semantic guidance to help resolve ambiguities in intersecting and overlapping areas. These predictions were made with respect to the complete context of the image and then globally combined by solving a Markov random field. In all tested cases, the system accurately matched the semantics of the drawings, and the meaningfulness of the output was confirmed.
Currently, there is a great interest in the vectorization of 3D models, specifically point clouds obtained by laser scanning of the road surface (e.g. Gao et. al.,2017;Yao et.al., 2021;Sui et. al., 2021), but these methods are demanding both in terms of data collection and processing. Emphasis is placed particularly on the bodies along the roadway, where the key parameter is primarily height, which serves as the basic classification of objects.
If the focus is only on the area of color segmentation, without a deeper knowledge of the individual information of the analyzed points, it is possible to look for its use especially in molecular biology or in the field of material structure assessment (Tańska et. al., 2005;Altan et. al., 2005;van Rossum et. al. 1995).
Generally, it can be stated that there are many publications on the subject, however, none of them focus on the area of transportation infrastructure and the use of orthophoto maps acquired by aerial surveying. The purpose of this contribution is to present the possibility of achieving specific findings of desired objects and subsequent conversion into vector form through simple operations using segmentation and filtering of the RGB spectrum of the image.
It can be stated that there are many publications on the subject, however, none of them focus on the area of transportation infrastructure and the use of orthophoto maps acquired by aerial surveying.

METHODOLOGY
The principle of proposed method is based on outputs from low attitude aerial photogrammetry of selected road segments (e.g. road section, intersection) and subsequent semi-automatic vectorization of selected road elements through segmentation of the RGB spectrum. While the method was mainly tested and designed for the photogrammetric orthophotos, the application is also possible for orthographic views generated from laser scanning. The target road elements can be represented either by linear features (such as horizontal road marking, road edges) or road equipment (such as guard rails, poles, bus stops).
The individual steps of the proposed approach are demonstrated through an example of processing in custom python software developed by the authors. The software is utilizing publicly available libraries, such as Tkinter, numpy, PIL, imageio, together with custom code. The processing is shown through extraction of the outline of the pavement edge and horizontal traffic signs (absolute and relative positions). This requirement is assumed to be the most frequent for the RSA purposes, therefore it represents an ideal case for proper validation of the applicability of proposed solution.

Figure 2. Example of input orthographic image of intersection.
The approach is based on processing the direct outputs from the digital image correlation, namely orthophoto and DEM. Both of these outputs are used in the graphical TIFF format. The advantage of working with this format is primarily that it enables a lossless compression, thus the original image quality is preserved throughout the subsequent processing. One of the benefits is also the ability to implement elevation information of individual measured points into the file or to more precisely parameterize the positional value of each point in the raster grid. The requirements on the orthographic image are mainly connected with image quality (such as elimination of vehicles on the road, vegetation occlusions or power lines wires), and GSD (GSD of 1 cm or higher).
While the processing can be performed without the DEM model, the extracted vectors are missing the height information. This may however be sufficient for certain scenarios, especially in cases of simplified RSA assessment. In case, that the DEM is provided, the sources are combined through an initial analysis of the input data, unification of the information and interpolation of the height information with consideration to potential differences in the resolution or alignment of provided data.

Initial segmentation
The proposed approach is based on semi-automatic approach, where the user defines, based on the desired elements for extraction, their main characteristics. Due to utilization of color segregation, this can be either through definition of RGB ranges or through manual selection of several points of interest on the orthophoto.
The image is subsequently divided into individual color clusters, which are either preserved or automatically removed from the image according to input requirements. The segmentation is performed not only on the combined RGB information but also for all three channels separately. The image does not change its size, but only filters out unnecessary points to easy the subsequent processing. This approach enables to process even images with high resolution, thus, longer road segments. In these steps, there is also a partial removal of noise, as the algorithm automatically eliminates clusters with different color spectra that are not located near the evaluated point or logically do not belong to the evaluated field. The processing is then followed by a function that searches for transitions of individual color spectra. Several previous and following points are checked and classified into groups. In the example (see Figure 3), the classification is focused on the horizontal traffic signs (white) and the ground surface of the road to define the pavement edge (black). The remaining data are separated and marked by green color. Furthermore, the filtering is also considering the relative distance of identified clusters and use this information to change the color segmentation threshold to fill potential defects or occlusions in the image.

Noise removal, object analysis and further adjustments
The initial processing is then, based on the results and optional input from the user, followed by "cleansing" phase. A "cleansing" function is used to reduce the errors from previous calculations and fill potential gaps in the identified objects. If isolated points appear that they do not logically fit into a given surface classification, they are recolored into the same color group as their surroundings. This is done with partial input from the user which is performing the processing and selects the level of noise removal. To eliminate minor noise in the image, representative outlines are computed for each identified object and the transition between individual colors defined by the groups are highlighted. The next step eliminates color information from the further processing and preserves only the outline edges of the designated color groups (Figure 4). At this stage, the outlines are rendered and objects are created, which already resemble the final output. However, the image is still affected by noise that was not possible to identify and remove automatically in previous steps. Currently, the following stage works in a semi-automatic mode. The image is divided into objects, the information about the found objects and its positional coordinates are displayed together with predicted type of the object (e.g. isolated, continuous, dashed line). The user is then checking the correctness of this assignment and by selecting multiple adjacent objects, can match corresponding objects or perform corrections. These steps can be used to create a single integrated object from several associated objects, which is then treated as one object in the next stage. At the same time, it is possible to erase objects that do not meet the desired criteria.

Vectorization
The final stage of image vectorization is addressed using two methods that depend on the input requirements of the user. It can be said that two distinct approaches are being utilized. The first approach involves searching for only the edges of objects, or rather, the transition in the color spectrum, and defining the outline only through individual points. The second option is to analyze the surface of a certain color spectrum. However, for both approaches the axis is determined and serves as the final output.
In the first case, each object in the image is analyzed independently. Objects are taken as a unique unit defined by a certain number of points. There may be a situation where the outline of the object is composed of a cloud of points that are located in a cyclic arrangement, such as dashed horizontal marking. In this case, the method of finding extreme points is utilized. These are then fitted with a vector and smoothed in such a way as to resemble reality as closely as possible (edges are created and subsequently joined) ( Figure 5). Another case is when an object, such as the edge of the pavement, is captured throughout the whole image and there is no cyclic repetition. In such a situation, the method searches for points that, within a certain range from the initial point of the object, follow or extend the input vector generated on the edge of the image ( Figure 6). Figure 6. Extraction of complex situation with continuous road elements across the whole image.
The curve is smoothed using the moving average method, and its shape closely resemble the current transition of individual elements (e.g., asphalt and unpaved surface, vegetation). The moving average method can also be replaced by polynomial regression, which represents an approximation of the given values by a polynomial. The coefficients of the sought polynomial are calculated by the least squares method, so that the sum of the squares of the deviations of the original values from the obtained polynomial is minimal. Determining the appropriate degree of the polynomial is crucial for accuracy, as a higher degree of uncertainty in the distribution of points requires the use of a higher degree of polynomial smoothing. Automation in the selection of an adequate polynomial degree is problematic, as it is not always possible to precisely determine the most suitable polynomial degree. Therefore, if the user selects this approach, multiple polynomial curves are generated and the most suitable is selected.
The second approach is working with an area, or a color cluster, which is generated by an algorithm and considered as a cohesive object (Figure 7). In this case, it is necessary to calculate the object into a coherent structure or to analyze the surrounding points, and for isolated points, convert their color to the color of the surrounding points. It is also possible to create linear regression in the point cloud, which can detect the approximate axis of the object. This method depends on subsequent joining of nearby vectors and their resulting smoothing.

Figure 7.
Merging of isolated color clusters, smoothing and final vector representation. The advantage of utilization of photogrammetric orthophotos is the knowledge of not only the relative position of the resulting vectors in the image, but also the information of the position of the actual image to the reference coordinate system. Thus, the actual position can be determined and subsequently exported in formats suitable for further processing into CAD programs that support design and construction, such as Autodesk or Bentley Systems products. If the DEM in TIFF format was available, the height coordinate is also determined.

DISCUSSION
The principle of object identification based on segmentation of RGB spectra of individual pixels is a complicated and highly problematic matter, which can, however, lead to significant time savings. It is important to mention that if the algorithm does not work on the principle of deep learning, thus, it is more difficult to extract complex properties from input data, and it is assumed that at a certain stage, it is necessary for the user to lead the algorithm, for example, to decide whether to continue working with these objects or whether they can be considered noise and removed, especially with poorly identifiable objects. The problem with vectorising road networks using color segmentation methods can exhibit complications and deficiencies that need to be adjusted in subsequent tests to achieve higher efficiency with minimal user assistance. A typical problem that needs to be solved can be the segmentation of a certain color spectrum, such as horizontal traffic signs, which in certain places on the road carry signs of wear or are covered with dirt. In such a case, it is difficult for the algorithm to detect the entire object because the initial RGB spectrum of the selected area does not show a closer similarity to the spectrum of filtered colors. This problem can be solved in the future, for example, by using machine-predefined palettes of individual objects that can be inserted into the image or overlaid over the part that the algorithm was able to identify automatically. Potential improvement is also a utilization combinatorics to find the best correction of the image with a partially defined internal surface structure. Another way to more precisely identify individual objects is to work with duplicate photographs, which are already adjusted in a different color spectrum, for example, the raster image already contains highlighted edges even in places that are not very noticeable in the original RGB spectrum. By using this image as a background, it would be possible to better detect individual shapes of the image and thus improve the object recognition system. At the same time, the resulting vectorization principle could affect, for example, a photograph taken on a sunny day with significant contrasts caused by shadows of surrounding vegetation, where the detection of a specific color could be influenced. Similarly, when a part of the road is covered by branches or a portal for variable traffic signs. In such a case, it could be appropriate to use an image that would be color-scaled according to the height of the terrain. If the algorithm could not detect the correct color shade at the object location, it would be possible to decide based on the adjusted photograph whether the location is affected by an external factor or not and decide how to continue working with the image.

CONCLUSIONS
In conclusion, utilizing low attitude aerial orthophotos as an input source of information has proven to be highly effective in accelerating basic projection work and assessing road safety in RSA. Compared to available map data from the traditional aerial photogrammetry, UAS orthophoto maps offer current state and significantly higher resolution. However, further development in automatic data extraction from orthophotos is necessary to make working with them more efficient and to shorten the time required to obtain the necessary information. The method proposed in this study provides a new approach to automatic semi-automatic object detection in orthophotos based on their RGB spectra, with subsequent vectorization, which can significantly improve the overall process. The development is currently in the testing phase, and further measurements are needed to increase accuracy and to achieve a more automated process. The final output will be a web application that is easy to use and available without the need to install programming software. It is expected to contribute to the automation of the entire workflow and reduce the manual labor required for vectorization. Future research should focus on improving the accuracy of the method and evaluating its performance on larger datasets.