Towards a Curatorial Agent for Heritage Institutions: Web Source Credibility Verification for Grounding Domain-Specific LLMs

Pshenova, Aleksandra; Ahn, Jaehong

doi:https://doi.org/10.5194/isprs-archives-XLVIII-M-9-2025-1213-2025

Articles | Volume XLVIII-M-9-2025

https://doi.org/10.5194/isprs-archives-XLVIII-M-9-2025-1213-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/isprs-archives-XLVIII-M-9-2025-1213-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume XLVIII-M-9-2025

03 Oct 2025

| 03 Oct 2025

Towards a Curatorial Agent for Heritage Institutions: Web Source Credibility Verification for Grounding Domain-Specific LLMs

Aleksandra Pshenova and Jaehong Ahn

Keywords: Hallucinations, BERT, BART, URL Classification, Source Credibility, Question Generation

Abstract. Hallucination problem is the main cause of the weak reliability of Large Language Models (LLMs) for their use in cultural institutions, such as museums and galleries. One proposed solution to the hallucination problem is to ground the LLM in the real data found on the Web. However, since the cultural heritage domain requires factual accuracy, cultural institutions cannot fully rely on the data obtained from the Web. To make the data suitable for the heritage domain use case, additional source filtering and verification must be applied. In this paper, we propose a potential source verification pipeline for verifying web sources, as well as a question-generating agent designed to guide heritage experts in collecting the right sources for their needs. Upon evaluation, the proposed system successfully filters the web-scraped sources given a search keyword, achieving moderate results in both classification tasks. In addition, our contributions include the curation of a custom dataset for training both models and estimation of an optimal training & dataset configuration for the proposed ’curatorial question generation’ task.

Towards a Curatorial Agent for Heritage Institutions: Web Source Credibility Verification for Grounding Domain-Specific LLMs

Useful Links

Useful External Links

Our Contact