A RESEARCH ON SPATIAL TOPOLOGICAL ASSOCIATION RULES MINING

Spatial association rules mining is a process of acquiring information and knowledge from large databases. Due to the nature of geographic space and the complexity of spatial objects and relations, the classical association rule mining methods are not suitable for the spatial association rule mining. Classical association rule mining treats all input data as independent, while spatial association rules often show high autocorrelation among nearby objects. The contiguous, adjacent and neighboring relations between spatial objects are important topological relations. In this paper a new approach based on topological predictions to discover spatial association rules is presented. First, we develop a fast method to get the topological relationship of spatial data with its algebraic structure. Then the interested spatial objects are selected. To find the interested spatial objects, topological relations combining with distance were used. In this step, the frequent topological predications are gained. Next, the attribute datasets of the selected interested spatial objects are mined with Apriori algorithm. Last, get the spatial topological association rules. The presented approach has been implemented and tested by the data of GDP per capita, railroads and roads in China in the year of 2005 at county level. The results of the experiments show that the approach is effective and valid.


INTRODUCTION
Spatial association rules mining is a technique which mines the association rules in spatial databases by considering spatial properties and predicates [1,2,3] .One important problem is how to get those spatial relations that compose the spatial properties and predicates from the spatial objects, and translate the non-structured spatial relations to structural expression so that they can be mined with the non-spatial data together.There are many researches about spatial association rules mining, Algorithm ARM involving the spatial relations such as direction, distance and topology has been well discussed in some literatures [2,[4][5][6][7] , whereas Fenzhen Su [8] centered on using spatial difference to express how spatial relations affect the interested spatial association rules we can get.
As spatial topological relation is one of the most important spatial relations, many literatures about spatial association rules mining fasten on it and presented many ways to get the topological relation, such as RCC (Region Connection Calculus), the classical MBR (minimum bounding rectangle), 9-intersection model.Ickjai Lee etc.Compared different RCC models' efficiency when used to get topological relations of a group of objects.MBR is a fast method to get the rough topological relations, so it is often used with other precise but expensive means.Eliseo Clementini etc. [3] put forward a method mining the spatial objects with uncertainty which uses the objects with a broad boundary to take the uncertainty of spatial information into account.The topological relations between objects with broad boundaries is described by 9-intersection which can be concisely represented by a special  As an indicator of a country's standard of living, GDP per capita is regarded as a kind of attribute data and represented by polygons in the level of county(figure 1), summing up to 3406 polygons.

Definitions
Definition 1 (Unit) According to theory of cell structure proposed by J.Corbett [9] , spatial feature should be divided into point, line, surface and volume.For instance, roads are directed lines and buildings are polygons.For the reason that polygon is made up of line, and point can be regarded as degraded line, line unit algebraic structure is the most basic as well as significant content in this paper.

X x is
given, then expression y=a(x) is the permutation of x under the precondition of X y  [10] .
Definition 3 (Involution) Involution can be seen as a particular case of permutation, if y=a(x)=a[a(x)]=x came into existence, this particular permutation would be called involution.
Definition 4 (Zero-order permutation 0 a (x) ) Zero-order permutation is defined as the transformation from unit variable +x to -x or from -x to +x based on the concept of involution.
Zero-order permutation gives a line with direction +x to -x.Definition 5 (First-order permutation 1 a (x)) First-order permutation records all the adjacent line units in counterclockwise order.First-order permutation gives the expression of point with all its crossing lines.Definition 6 (Algebraic Structure) Based on the concepts and idea above, the expression {X , 0 a (x), 1 a (x)} is called the algebraic structure of spatial graphic.

Structure
Taking line graphic in figure 3  line units are not allowed to cross with each other.
Then the second step is producing zero-order permutation 0 a (x) and first-order permutation 1 a (x) with X x , as table 1 shows.In order to save memory space, 1 a (x) is recorded as the nearest adjacent line unit in the counterclockwise rotation searching, instead of all the adjacent line units.For instance, line unit 1 rotates in counterclockwise order with P1 as pivot, and gets 1 a (1) =5, while line unit -1 rotates in counterclockwise order with P2 as pivot and gets 1 a (-1) =3.
Since the topological relation is represented as functional relation of units, the topological operation upon given unit will only affect adjacent units.Therefore, the method of representing topological relation by line unit algebraic structure can effectively eliminate the phenomenon of data linkage and topological redundancy, bringing the improvement of spatial analysis efficiency.

Association Rules Mining
The topological relation matrix in part 2.2 is regarded as waiting mining transaction database, and needs inspection.From the numbering graphic and index of unit variables X could we produce zero-order permutation 0 a (x) and first-order permutation 1 a (x).For instance, the original railway layer is represented like figure 6, while GDP per capita layer seems more complex as figure 7 shows.In order to apply association rules mining algorithm, firstly we need to generalize the continuous attribute--GDP per capita data.For the specific experiment, generalizing the GDP per capita data into three levels based on its value, that is, the upper third of GDP per capita in the order from the most to the least is "high", the middle third is "medium" and the rest is "low".

. Filtered result of frequent 3-itemsets
After further analyzing the filtered result, we find that these railways and highways are built relatively earlier, and they remarkably stimulated the economic development of perimeter zone and even the whole nation.Therefore, the mining result shows that railroads and roads could drive the economic growth of circumjacent counties and cities.This conclusion fits the fact and proves that the mining result is believable.

CONCLUSIONS
It can be concluded that the brand new method to organize spatial data based on algebraic structure can deal with topological relation effectively and efficiently, for either topological construction or maintenance.Furthermore, the association rules mining results based on it are believable.
However, there still are a few problems should be considered.Even though the process of constructing topological relation from algebraic structure costs little time, the efficiency of constructing of algebraic structure of mass data needs further studies.Moreover, the containing relation is not optimized in the algebraic structure.

3 × 3
matrix and can distinguish 56 kind of topological relations between objects with broad boundaries.All those methods have a common problem when extracting topological relations, they are very time consuming.Paralleled to the concept of algebraic structure in axiomatic system of set theory, topological structure has continuously been the organization of spatial data in GIS, such as the typical data format of ArcInfo--Coverage and the new data model after ArcInfo 8--Geodatabase.Both algebraic structure and topological structure can express topological relations.For the reason that algebraic structure is the combination of adjacent unit variables, it would not cause

Figure 1 .
Figure 1.Counties with GDP per capita attribute in China

Figure 2 .
Figure 2. Railroad and road data in China as an example, the process of producing line unit algebraic structure expression can be divided into two steps.The first step is the unitization of spatial graphic, at the same time, the unitized result needs numbering in the same order with unitization to constitute index of unit variables X.The fundamental principles of unitization are every terminal point and cross point should be treated as point unit; International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B2, 2012 XXII ISPRS Congress, 25 August -01 September 2012, Melbourne, Australia there must not exist any point unit in the middle of any line unit;

Figure 3 Table 1 . 3 From the line unit algebraic structure could we get topological relation easily by different combination of zero-order permutation 0 a (x) and first-order permutation 1 a
Figure 3. Sample line graphic Representing the topological relation by line unit algebraic structure and qualitatively storing the result render this spatial relation can be processed in the same mining way with attribute data.This step is to complete association rules mining with typical Apriori algorithm, based on the given minimum support and minimum confidence.Apriori algorithm is a primary algorithm of mining association rules of attribute, aiming at finding out relation among items of a data set.

4 International
In order to reduce the recorded points as well as display the graph more clearly, one step before construction of algebraic structure is to simplify the polygons by Point Remove algorithm, which is a fast, simple algorithm that reduces a polygon boundary quite effectively by removing redundant points.A relatively sketchy outline is precise enough and extraordinarily effective for algebraic analysis display.FigureArchivesof the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B2, 2012 XXII ISPRS Congress, 25 August -01 September 2012, Melbourne, Australia shows the visualization of simplified polygons of GDP per capita source data.

Figure 8 .Figure 9 .
Figure 8. Topological relation--intersecting matrix This study was supported by the National Science and Technology Pillar Program during the Twelfth Five-Year Plan Period (No. 2012BAJ15B04) and the National Nature Science Foundation of China (No. 40801152, No. 61172175 and No. 41071249).