<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="3.0" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher">ISPRS-Archives</journal-id>
<journal-title-group>
<journal-title>The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences</journal-title>
<abbrev-journal-title abbrev-type="publisher">ISPRS-Archives</abbrev-journal-title>
<abbrev-journal-title abbrev-type="nlm-ta">Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">2194-9034</issn>
<publisher><publisher-name>Copernicus Publications</publisher-name>
<publisher-loc>Göttingen, Germany</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.5194/isprs-archives-XLVIII-G-2025-597-2025</article-id>
<title-group>
<article-title>A Novel Correspondence Model for Linking Objects and Texts in Construction Plans</article-title>
</title-group>
<contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Hong</surname>
<given-names>Shuwei</given-names>
<ext-link>https://orcid.org/0009-0009-7673-9841</ext-link>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Landgraf</surname>
<given-names>Steven</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Hillemann</surname>
<given-names>Markus</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Ulrich</surname>
<given-names>Markus</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
</contrib-group><aff id="aff1">
<label>1</label>
<addr-line>Institute of Photogrammetry and Remote Sensing, Karlsruhe Institute of Technology, Karlsruhe, Germany</addr-line>
</aff>
<pub-date pub-type="epub">
<day>28</day>
<month>07</month>
<year>2025</year>
</pub-date>
<volume>XLVIII-G-2025</volume>
<fpage>597</fpage>
<lpage>604</lpage>
<permissions>
<copyright-statement>Copyright: &#x000a9; 2025 Shuwei Hong et al.</copyright-statement>
<copyright-year>2025</copyright-year>
<license license-type="open-access">
<license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri"  xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p>
</license>
</permissions>
<self-uri xlink:href="https://isprs-archives.copernicus.org/articles/XLVIII-G-2025/597/2025/isprs-archives-XLVIII-G-2025-597-2025.html">This article is available from https://isprs-archives.copernicus.org/articles/XLVIII-G-2025/597/2025/isprs-archives-XLVIII-G-2025-597-2025.html</self-uri>
<self-uri xlink:href="https://isprs-archives.copernicus.org/articles/XLVIII-G-2025/597/2025/isprs-archives-XLVIII-G-2025-597-2025.pdf">The full text article is available as a PDF file from https://isprs-archives.copernicus.org/articles/XLVIII-G-2025/597/2025/isprs-archives-XLVIII-G-2025-597-2025.pdf</self-uri>
<abstract>
<p>Construction plans integrate visual and textual information that is essential for construction projects. However, the huge diversity of formats of these plans poses challenges for automated analysis. This paper presents a novel correspondence model that links objects and texts in construction plans, providing a unified approach to interpreting various formats, such as scanned blueprints, CAD drawings, and digital construction documents. Leveraging deep-learning-based object detection and text recognition techniques, our model establishes semantic correspondences between visual and textual elements. We integrate CLIP-based models with ViT-based encoders as part of our approach to enhance feature extraction and correspondence learning. By employing a threshold-based determination, our model effectively resolves cases where a single text passage may describe multiple objects or where a single object is referenced by multiple pieces of text. This capability enables the model to establish robust correspondences between objects and texts, laying a strong foundation for subsequent semantic understanding and information extraction. We evaluate its effectiveness on labeled datasets and demonstrate that our model achieves high precision, recall, F1-score, and accuracy. Hence, we provide a feasible approach to establishing object-text correspondences in construction plan analysis. The results suggest its potential to serve as a foundation for further exploration in the automated analysis of technical drawings, particularly in the context of quality assurance and construction project planning.</p>
</abstract>
<counts><page-count count="8"/></counts>
</article-meta>
</front>
<body/>
<back>
</back>
</article>