Assessing the Generalization Capacity of Convolutional Neural Networks and Vision Transformers for Deforestation Detection in Tropical Biomes
Keywords: Deforestation Detection, Deep Learning, Convolutions, Transformers, Domain Shift
Abstract. Deep Learning (DL) models, such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), have become popular for change detection tasks, including the deforestation mapping application. However, not enough attention has been paid to the domain shift issue, which affects classification performance when pre-trained models are used in areas with different forest covers and deforestation practices. This study compares DL methods for deforestation detection, focusing on assessing how well CNNs and ViTs can adapt to the domain shift. Two different models, namely, DeepLabv3+ and UNETR, were trained using remote sensing images and references from a specific location and then tested in other sites to simulate real-world scenarios. The results showed that the ViT-based architecture achieved better performance when trained and tested in the same region but showed lower generalization capacity in cross-domain scenarios. We consider this a work in progress that needs further research to confirm its findings, with the evaluation of additional architectures on a wider range of domains.