The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Share
Publications Copernicus
Download
Citation
Share
Articles | Volume XLVIII-M-9-2025
https://doi.org/10.5194/isprs-archives-XLVIII-M-9-2025-1213-2025
https://doi.org/10.5194/isprs-archives-XLVIII-M-9-2025-1213-2025
03 Oct 2025
 | 03 Oct 2025

Towards a Curatorial Agent for Heritage Institutions: Web Source Credibility Verification for Grounding Domain-Specific LLMs

Aleksandra Pshenova and Jaehong Ahn

Keywords: Hallucinations, BERT, BART, URL Classification, Source Credibility, Question Generation

Abstract. Hallucination problem is the main cause of the weak reliability of Large Language Models (LLMs) for their use in cultural institutions, such as museums and galleries. One proposed solution to the hallucination problem is to ground the LLM in the real data found on the Web. However, since the cultural heritage domain requires factual accuracy, cultural institutions cannot fully rely on the data obtained from the Web. To make the data suitable for the heritage domain use case, additional source filtering and verification must be applied. In this paper, we propose a potential source verification pipeline for verifying web sources, as well as a question-generating agent designed to guide heritage experts in collecting the right sources for their needs. Upon evaluation, the proposed system successfully filters the web-scraped sources given a search keyword, achieving moderate results in both classification tasks. In addition, our contributions include the curation of a custom dataset for training both models and estimation of an optimal training & dataset configuration for the proposed ’curatorial question generation’ task.

Share