The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Share
Publications Copernicus
Download
Citation
Share
Articles | Volume XLVIII-2/W12-2026
https://doi.org/10.5194/isprs-archives-XLVIII-2-W12-2026-527-2026
https://doi.org/10.5194/isprs-archives-XLVIII-2-W12-2026-527-2026
12 Feb 2026
 | 12 Feb 2026

Saliency-Driven View Planning for Cultural Heritage Guided Tours

Tian Zhang and Sagi Filin

Keywords: Neural saliency, Deep learning, View selection, Point cloud visualization

Abstract. Three-dimensional point clouds are a key documentation form for heritage site interpretation and conservation management. Nonetheless, their unstructured organization and lack of semantic information, inhibit the communication and interpretation ability of the information therein. Therefore, the development of solutions that allow highlighting key entities of a scanned object or within the scanned scene is imperative. In addition, proposing views that are intuitive for interpretation can support navigation in the usually vast data volumes. To date, highlighting entities and creating viewpoint representations rely on expert annotation or handcrafted saliency measures combined with heuristic optimization. These are typically designed for small, watertight objects, turning them noise-sensitive, labor-intensive, and difficult to scale to large sites. In this paper, we introduce a neural framework to highlight salient regions and propose key views that capture its essence. We detect saliency by following a heat-diffusion-driven objective and learning data-adaptive point representations. We further capture global saliency through clustering, followed by their pairwise comparison. This translates into a high-quality saliency prediction that emphasizes the most visually and semantically interesting regions. We also propose a greedy viewpoint selection strategy to capture the most meaningful views while remaining efficient on large-scale data. Our approach outperforms state-of-the-art saliency detection neural methods for both small- and large-scale objects and scenes. Our model highlights key views and facilitates human-centric tours and best-view selection. The proposed method processes 14M points in under 15 seconds, compared to nearly four hours on existing state-of-the-art models, making it computationally appealing.

Share