LARGE VECTOR SPATIAL DATA STORAGE AND QUERY PROCESSING USING CLICKHOUSE
Keywords: ClickHouse, vector spatial data, query processing, HBase, remote sensing
Abstract. The exponential growth of geospatial data resulting from the development of earth observation technology has created significant challenges for traditional relational databases. While NoSQL databases based on distributed file systems can handle massive data storage, they often struggle to cope with real-time query. Column-storage databases, on other hand, are highly effective at both storage and query processing for large-scale datasets. In this paper, we propose a spatial version of ClickHouse that leverages R-Tree indexing to enable efficient storage and real-time analysis of massive remote sensing data. ClickHouse is a column-oriented, open-source database management system designed for handling large-scale datasets. By integrating R-Tree indexing, we have created a highly efficient system for storing and querying geospatial data. To evaluate the performance of our system, we compare it with HBase, a popular distributed, NoSQL database system. Our experimental results show that ClickHouse outperforms HBase in handling spatial data queries, with a response time approximately three times faster than HBase. We attribute this performance gain to the highly efficient R-Tree indexing used in ClickHouse, which allows for fast spatial data query.