MAPPING GLACIER CHANGES USING CLUSTERING TECHNIQUES ON CLOUD COMPUTING INFRASTRUCTURE
Keywords: Remote Sensing, Big Data, Cloud Computing, Glacier Changes, Clustering Techniques
Abstract. Climate change and its effects are taking more importance nowadays; and glaciers are one of the most affected ecosystems by that, considering that the energy of Earth’s surface and its temperature may be directly related to glacier temporal changes. Then, the comprehension of glaciers behaviour, by its retreating or melting critical conditions, can be achieved by the analysis of Remote Sensing data, but considering the unprecedented volumes of information currently provided by satellites sensors, we can refer to this analysis as a big data problem. Machine learning techniques have the potential to improve the analysis of this type of data; however, most current machine learning algorithms are unable to properly process such huge volumes of data. In the attempt to overcome the computational limitations related to Remote Sensing Big Data analysis, we implemented the K-Means and Expectation Maximization algorithms, as distributed clustering solutions, exploiting the capabilities of cloud computing infrastructure for processing very large datasets. The solution was developed over the InterCloud Data Mining Package, which is a suite of distributed classification methods, previously employed in hyperspectral image analysis. In this work we extended the functionalities of that package, by making it able to process multispectral images using the aforementioned clustering algorithms. To validate our proposal, we analysed the Ausangate glacier, located on the Andes Mountains, in Peru, by mapping the changes in such environment through a multi-temporal Remote Sensing analysis. Our results and conclusions are focused on the thematic accuracy and the computational performance achieved by our proposed solution. Thematic accuracy was assessed by comparing the automatically detected glacier areas by the clustering approaches against the manually selected ground truth data. We compared the computational load involved in executing the clustering processes sequentially and in a distributed fashion, using a local mode and cluster configuration over a cloud computing infrastructure.