ISPRS-Archives

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

ISPRS-Archives

Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci.

2194-9034

Copernicus Publications

Göttingen, Germany

10.5194/isprs-archives-XLVIII-G-2025-597-2025

A Novel Correspondence Model for Linking Objects and Texts in Construction Plans

Hong

Shuwei

https://orcid.org/0009-0009-7673-9841

¹ Landgraf

Steven

¹ Hillemann

Markus

¹ Ulrich

Markus

Institute of Photogrammetry and Remote Sensing, Karlsruhe Institute of Technology, Karlsruhe, Germany

28 07 2025

XLVIII-G-2025 597 604

2025

This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/

This article is available from https://isprs-archives.copernicus.org/articles/XLVIII-G-2025/597/2025/isprs-archives-XLVIII-G-2025-597-2025.html

The full text article is available as a PDF file from https://isprs-archives.copernicus.org/articles/XLVIII-G-2025/597/2025/isprs-archives-XLVIII-G-2025-597-2025.pdf

Construction plans integrate visual and textual information that is essential for construction projects. However, the huge diversity of formats of these plans poses challenges for automated analysis. This paper presents a novel correspondence model that links objects and texts in construction plans, providing a unified approach to interpreting various formats, such as scanned blueprints, CAD drawings, and digital construction documents. Leveraging deep-learning-based object detection and text recognition techniques, our model establishes semantic correspondences between visual and textual elements. We integrate CLIP-based models with ViT-based encoders as part of our approach to enhance feature extraction and correspondence learning. By employing a threshold-based determination, our model effectively resolves cases where a single text passage may describe multiple objects or where a single object is referenced by multiple pieces of text. This capability enables the model to establish robust correspondences between objects and texts, laying a strong foundation for subsequent semantic understanding and information extraction. We evaluate its effectiveness on labeled datasets and demonstrate that our model achieves high precision, recall, F1-score, and accuracy. Hence, we provide a feasible approach to establishing object-text correspondences in construction plan analysis. The results suggest its potential to serve as a foundation for further exploration in the automated analysis of technical drawings, particularly in the context of quality assurance and construction project planning.