A Scene Graph Generation Method for Historical District Street-view Imagery: A Case Study in Beijing, China
Keywords: Scene Graph, Street-view, Historical Districts, Image Captioning
Abstract. Using street-view imagery for interpreting diverse street-scale elements and their relationships within historical districts offers high efficiency and low cost for preservation and management. Scene graphs provide a structured representation of objects and their relationships within a scene. However, applying existing scene graph generation techniques directly to street-view imagery presents challenges due to the complexity of elements and narrow street spaces. This paper introduces HSSGG (Historical Street-view Scene Graph Generation), a predictive model that effectively identifies elements and their relationships. By incorporating an end-to-end Relation Transformer with the parameter-free attention and coordinate attention modules, HSSGG improves relationship prediction accuracy, even with limited samples, and enhances the precision of scene graph generation in complex environments. Test on 200 panoramic images from historical districts in Beijing shows that HSSGG outperforms existing single-stage relation prediction models (such as RelTR and FCSGG) in accuracy and stability. These results provide valuable insights for the preservation and management of historical districts.