CLUSTER ANALYSIS: A COMPREHENSIVE AND VERSATILE QGIS PLUGIN FOR PATTERN RECOGNITION IN GEOSPATIAL DATA
Keywords: Clustering, Feature selection, Clusters evaluation, Pattern recognition, QGIS Plugin, Python, FOSS4G
Abstract. As geospatial data continuously grows in complexity and size, the application of Machine Learning and Data Mining techniques to geospatial analysis is increasingly essential to solve real-world problems. Although in the last two decades, the research in this field produced innovative methodologies, they are usually applied to specific situations and not automatized for general use. Therefore, both generalization and integration of these methods with Geographic Information Systems (GIS) are necessary to support researchers and organizations in data exploration, pattern recognition, and prediction in the various applications of geospatial data. In this work, we present Cluster Analysis, a Python plugin that we developed for the open-source software QGIS and offers functionalities for the entire clustering process. Or tool provides different improvements from the current solutions available in QGIS, but also in other widespread GIS software. The expanded features provided by the plugin allow the users to deal with some of the most challenging problems of geospatial data, such as high dimensional space, poor quality of data, and large size of data. To highlight both the potential of the plugin and its limitations in real-world scenarios, the development is integrated with a considerable experimental phase with data of different natures and granularities. Overall, the experimental phase shows good and adequate flexibility of the plugin, and outlines the possibilities for future developments that can be provided also by the QGIS community, given the open-source nature of the project.