AN ANALYSIS OF THE INFLUENCE OF THE NUMBER OF OBSERVATIONS IN A RANDOM FOREST TIME SERIES CLASSIFICATION TO MAP THE FOREST AND DEFORESTATION IN THE BRAZILIAN AMAZON
Keywords: Classification, Random Forest, Landsat, Time series, Land cover, Brazilian Amazon
Abstract. Remote sensing has been an essential tool in combating deforestation. However, the ever-rising deforestation rates require new remote sensing techniques. This paper presents a study to determine the effects on the accuracy of the data analysis of varying the number of satellite observations, using a Random Forest classification algorithm. We carried out experiments on the Landsat-8 data cube with 22 images and developed an automatic sampling system based on PRODES to generate the labeled time series. We split the time series dataset to build data subsets with different number of observations. The results showed that a fewer number of observations negatively effects the accuracy of the RF algorithm when analyzing deforested areas, but not forest areas. The RF classifiers were compared using a random test data set, where all classifiers presented an Overall Accuracy (OA), Balanced Accuracy (BA), and f1-score (F1) above 97%. In the first evaluation, the variation in the number of observations appears to cause little influence on the classification accuracy. The analysis used the reference map to contrast the RF classifier’s results. The results showed that the best results in OA occurred with fewer observations. The best performance of 96% happened with four observations. We evaluated the performance of the classes, deforestation, and forest individually. The results showed that a fewer number of observations had negative effects on the accuracy of the RF algorithm when analyzing deforested areas, but not forest areas. Finally, we evaluated the visual quality of the land cover maps produced.