VALIDATION ANALYSIS OF OPENSTREETMAP DATA IN SOME AREAS OF CHINA

The rapid development of computer technologies has given rise to the increase of open source web-based map services such as OpenStreetMap, a global vector data created by volunteers for free use. There is a concern about the quality and usability of the OpenStreetMap data because the volunteers that contribute the data generally lack the sufficient cartographic training. This paper focuses on the data quality analysis method for OpenStreetMap. A model for usability evaluation has been proposed. A benchmark between OpenStreetMap data and the1:10 000 topographic data in some areas of China has been done to verify the proposed model, and the method proves to be effective.


INTRODUCTION
With the advent of Web 2.0, geographic information service pattern has undergone tremendous change.People gradually become geographic information providers through uploading their data, which was termed Volunteered Geographic Information (VGI) by Goodchild [1] .OpenStreetMap (OSM) is one valuable application of VGI.
OSM was initiated by Stephen Coast in July 2004 at the University College London.Since its establishment, OSM is expanding scale, the number of registered users from hundreds in the middle of 2004 increase to more than five hundred thousand in November 2011.As an online map collaborative plan, provided voluntarily by individuals involving the capture, processing and dissemination of geographic information, the project aims to create and distribute vector data for the world because most maps thought of as free actually have legal or technical restrictions on their use, holding back people from using them in creative, productive, or unexpected ways.There are three sources for OSM to obtain vector data, including hand-held GPS receiver trace data from users, donations from institution and organization, vectorization of images such as Landsat, Yahoo, Imagery, etc. Spatial data is the base of geographic information and its quality is directly related to the accuracy of spatial analysis and operation.Though OSM project has many advantages, there are concerns about how the OSM quality is and what aspects of application it can meet for the volunteers that contribute lack professional knowledge and sufficient cartographic training.This paper focuses on the OSM data quality analysis method, and proposes an evaluation model.
Many scholars have performed a series of researches on OSM data quality [2][3][4] .Initially, Mordechai Haklay focused on the positional accuracy and length completeness of England OSM data through comparison with the Ordinance Survey's Meridian 2 dataset.The methodology used to evaluate the positional accuracy was based on Goodchild and Hunter (1997) and   Hunter (1999).The comparison was carried out by using buffers to determine the percentage of line from one dataset that is within a certain distance of the same feature in another dataset of higher accuracy.The completeness used the formula calculated as: Σ(OSM roads length) -Σ(Meridian roads length).The analysis shows that OSM information can be fairly The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-4, 2014   ISPRS Technical Commission IV Symposium, 14 -16 May 2014, Suzhou, China   This contribution has been peer-reviewed.doi:10.Overall, there were three problems on the analysis of OpenStreetMap.Firstly, the operation efficiency was low for the difficulty to obtain high accuracy vector data as reference data .Secondly, the method to access data completeness and attribute accuracy needed to improve, for example, the formula to analysis length completeness couldn't completely reflect the data quality when some important roads missed .Thirdly, researches stated above analyzed the data quality from the quality element, which easily showed how each quality element was but had difficulty in describing what fields the data was fit for .

OSM Quality Evaluation Model
It has always been a concerned problem for data producers and users about how to accurately assess the spatial data quality.In Linageabout the history of the dataset such as how it was collected and evolved.
Position accuracythe accuracy of features or geographic objects in either two or three dimensions.
Attribute accuracy -the degree attribute field values adherence to the true data.
Logical consistencythe degree of adherence to logical rules of data structure, attribution and relationships.

Completeness -presence and absence of objects in a dataset
Temporal qualitythis is a measure of the validity of changes in the database in relation to real-world changes and also the rate of updates.
Considered the reality and operability, we conduct the study from completeness, position accuracy, attribute accuracy and logical consistency which is regarded as first level element.
Then we subdivide these four elements into more detailed quality element as second level element, completeness divided into length completeness and name completeness and attribute accuracy divided into type accuracy and name accuracy.
Logical consistency is assessed through whether it satisfies the topology rules.
It uses weight coefficient to express the contribution each element of spatial data quality on the results of comprehensive evaluation, and the weight coefficient reflects the relative importance of each element participating in the evaluation.
Four methods are commonly used to decide the weight coefficient, including subjective experience judgment, expert investigation or consultation, vote from the judgment panel and analytic hierarchy process (AHP).After reading related documents [5][6] and analyzing the importance of the element, we make the distribution among these elements including first level and second level element.
Where P i = weight coefficient of some quality element Q i =score of some quality element The data is regarded as unqualified if one element's score is less than 60.Data quality rating is classified according to current standards.The calculation method of other roads type accuracy is similar to the calculation method above: Q 112 =100*(the length of roads with right other type)/ (the total length of roads with right other type and those should be modified to others) Then type accuracy is got through the following method: In terms of road name annotation for the same road, there are four cases: both have name for reference data and experiment data; reference data has name but experiment data doesn't which is considered in the name completeness; experiment data has name but experiment data doesn't; both have no name.
For the name accuracy analysis, wrong name roads refer to that both have name but don't match.The following is the formula of name accuracy: Q 12 =100-100*(the length of road with wrong name in OSM)/ (the length of road with name in OSM) Then attribute accuracy is calculated:

Data Completeness
This paper focuses on length completeness and name completeness.The length completeness refers to whether there are missing roads compared with reference data.The name completeness means the reference data has name but the experiment data doesn't have for the same road.
It usually compares both data's length at length completeness analysis, which can reflect experiment data's detailed degree, but can't ensure the major roads exist when experiment data is more detailed than reference data for on that condition, experiment data length must be larger than reference data length.For redundant path in the experimental data, it doesn't participate in the score calculation for reference data is of high precision and can satisfy some certain applications.Hence, the author makes some changes on the previous calculation method.
The previous length completeness is calculated as the percentage of the length of the experiment dataset to the length of the reference dataset.The current calculation formula is as follows: Q 21 =100-100*(the length of OSM missing road in reference data) / (total length of reference data) For name completeness, we also just consider the condition names are missing compared with reference data and it is calculated as follows: Q 22 =100-100*(the length of OSM road missing the name) / (total length of OSM data) Then Q 2 can be got through:

Position Accuracy
As to position accuracy, Tveite [7] defines two aspects of linear accuracy: a) positional point accuracy: positional accuracy can easily be given for well-defined points on the line (e.g. the end-points).For the rest of the line, it is difficult to say anything about positional accuracy and to quantify it, b) shape fidelity: to be able to say something about the accuracy of a line, it is useful to talk about its shape fidelity as compared to another line.This contribution has been peer-reviewed.doi:10.5194/isprsarchives-XL-4-383-2014digitized linear features [8] .Its idea is to regard the reference data as true data for reference data is with higher accuracy, and normally the deviation of experiment data with reference data should be within a range.As figure 1 shows, a buffer of width x(x equals to half of road width) is created for the reference feature so as to calculate the proportion of the experiment feature that lies within the buffer.The method has many advantages: ○ 1 it is relatively insensitive to extreme outliers; ○ 2 needn't match between the datasets.○ 3is easy to operate for it bases on a simple overlay process that could be done in most vector GIS programs.
In this calculation, scaling factor is subdivided into 0. Q 4 =100-100*(the length of OSM road that don't obey these rules)/ (the total length of OSM roads)

METHOD VALIDATION
1:10 000 national basic data is chosen as reference data to validate the method of OSM vector road data quality，and three cities Handan, Lanzhou, Nantong are chosen as study areas.

Calculation the Attribute Accuracy
As mentioned in section 2，road type error falls into two kinds: Road type that should be coded has empty code or has wrong road type; or road type that should be null has code.Table 3 shows type accuracy, the data distribution of type accuracy sees the appendix 1.  Table 3 shows the type accuracy gap among three cities is very large, and the road that should be coded always misses the type.
Name accuracy analysis is performed by visual contrast to find out the road with wrong name for the number of road with name is small (the data distribution of name accuracy sees the appendix 2).Table 5 shows the attribute accuracy scores.

Calculation the Completeness
ArcGIS is applied to select the missing road of OSM from reference data (the data distribution of length completeness sees the appendix 3, the data distribution of name completeness sees the appendix 4).

Calculation the Position Accuracy
Reference data contains part road edges, which can be used to generate road surface.As to the road without edges, centerlines are applied to generate buffer, whose distance is respectively 11.25 meters and 7.5 meters in G, S roads and other roads (the data distribution of position accuracy sees the appendix 5).The quality level of three cities is evaluated according to the quality rating standard, and the results including the assessment result of each quality element basically conform to experts' evaluation results.

Name accuracy
order to better describe it, quality elements are used to express spatial data quality .In recent years, scholars has made deep studies on what elements should be used, but have not yet formed uniformed quality elements, and following elements are usually used: Road type is divided based on its position, roles and service function in the road system, which reflects road importance to some extent.Chinese roads are divided into national highway, provincial road, county road, township road and dedicated road according to administrative level and in the first four kinds, road encoding is G,S,X and Y respectively.In the road type accuracy calculation, scaling factor is subdivided into 0.6 and 0.4，G,S,X and Y roads weighted 0.6 and other roads weighted 0.4.G, S, X and Y road type accuracy is calculated as follows:Q 111 =100*(the length of roads with right type G, S, X and Y)/ (the total length of roads with right type G, S, X and Y and those should be modified to G, S, X and Y)

FigureFigure 9
Figure 5 Lanzhou Name Accuracy 3. Length completeness 5194/isprsarchives-XL-4-383-2014 accurate: on average within about 6 meters of the position recorded by the Ordnance Survey, and with approximately 80% overlap of motorway objects between the two datasets.

Table 1
Evaluation model connects the quality elements by adopting weighted coefficient.After getting each element's score, we summary the score according to the weight coefficient.The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, VolumeXL-4, 2014   ISPRS Technical Commission IV Symposium, 14 -16 May 2014, Suzhou, ChinaThis contribution has been peer-reviewed.doi:10.5194/isprsarchives-XL-4-383-2014

Table 4
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, VolumeXL-4, 2014   ISPRS Technical Commission IV Symposium, 14 -16 May 2014, Suzhou, ChinaThis contribution has been peer-reviewed.
In Handan and Nantong, the road with name is only one, which leads to the score is full or zero.In Lanzhou, the errors are mainly wrong description to orientation, for example Binhe Mid Road leveled into Binhe East Road.doi:10.5194/isprsarchives-XL-4-383-2014