CLUSTERING AND INDEXING HISTORIC VESSEL MOVEMENT DATA WITH SPACE FILLING CURVES
Keywords: vessel movement, space filling curves, clustering, indexing, space-time queries, query framework
Abstract. This paper reports on the result of an on-going study using Space Filling Curves (SFCs) for indexing and clustering vessel movement message data (obtained via the Automated Identification System, AIS) inside a geographical Database Management System (Geo-DBMS). With AIS, vessels transmit their positions in intervals ranging from 2 seconds to 3 minutes. Every 6 minutes voyage related information is broadcast.
Relevant AIS messages contain a position, timestamp and vessel identifier. This information can be stored in a DBMS as separate columns with different types (as 2D point plus time plus identifier), or in an integrated column (as higher dimensional 4D point which is encoded as the position on a space filling curve, that we will call the SFC-key). Subsequently, indexing based on this SFC-key column can replace separate indexes (where this one integrated index will need less storage space than separate indexes). Moreover, this integrated index allows a good clustering (physical ordering of the table). Also, in an approach with separate indexes for location, time and object identifier the query optimizer inside a DBMS has to estimate which index is most selective for a given query. It is not possible to use two indexes at the same time – e.g. in case of a space-time query. An approach with one multi-dimensional integrated index does not have this problem. It results in faster query responses when specifying multiple selection criteria; i.e. both search geometry and time interval.
We explain the steps needed to make this SFC approach available fully inside a DBMS (to avoid expensive data transfer to external programs during use). The SFC approach makes it possible to better cluster the (spatio-temporal) data compared to an approach with separate indexes. Moreover, we show experiments (with 723,853,597 AIS position report messages spanning 3 months, Sep–Dec 2016, using data for Europe, both on-sea and inland water ways) to compare an approach based on one multi-dimensional integrated index (using a SFC) with non-integrated approach. We analyze loading time (including SFC encoding) and storage requirements, together with the speed of execution of queries and granularity of answers.
Conclusion is that time spend on query execution in case of space-time queries where both dimensions are selective using the integrated SFC approach outperforms the non-integrated approach (typically a factor 2–6). Also, the SFC approach saves considerably on storage space (less space needed for indexes). Lastly, we propose some future improvements to get some better query performance using the SFC approach (e.g. IOT, range-glueing and nD-histogram).