PCINet: a Prototype- and Concept-based Interpretable Network for Mutli-scene Recognition
Keywords: Aerial image interpretation, Multi-scene recognition, Network interpretability, Concept bottleneck
Abstract. With the development of remote sensing techniques, a large number of high-resolution aerial images is now available and benefit many applications. Multi-scene recognition plays a key role in applying remote sensing images to these applications, which refers to predicting multiple scenes coexisted in an aerial image and has attracted an increasing attention. Recently, most researchers tend to invent deep learning-based recognition models and has gained great achievements. However, few efforts have been deployed to explaining the success of deep neural networks in multi-scene recognition. To address this, we introduce concept bottleneck model (CBM) to interpreting model performance and propose a novel network, namely Prototype- and Concept-based Interpretable Network (PCINet), that projects aerial imagery into a prototype-concept memory bank and encode their correlations for explaining how a network can identify coexisting scenes in an aerial image. Specifically, the proposed network mainly consists of two branches: prototype matching that measures similarity scores between image features and scene prototypes, and concept bottleneck branches that aligned image features to textual embeddings and compute their relations with concept embeddings. Afterwards, Outputs are integrated for inferring scene categories. Experimental results show that the model enhances interpretability, providing valuable insights for urban planning and resource management, thereby bridging the gap between deep learning models and practical applications.