An Approach to Searching for Hieroglyphs in Chinese Manuscript Archives Based on Morphological Analysis
Keywords: Hieroglyph, Metagraph, Continuous Skeleton, Graph Similarity, Efficient Recognition
Abstract. Searching for Chinese characters in large archives of handwritten documents using a single sample in a query is a so-called single recognition task. The total number of different characters in the archives of ancient Chinese handwritten texts is estimated at several tens of thousands, which creates substantial obstacles for machine-learning based methods due to complexity of constructing training datasets. This article proposes an effective method for recognizing and searching for characters based on a direct comparison of the form of the query character with the characters from the file. We proposed a method for constructing a hieroglyph model in the form of a planar geometric graph. A measure is proposed for assessing the similarity and difference of generally non-isomorphic geometric graphs of hieroglyphs by solving assignment problem and a recognition method based on this measure. Computational experiments with large databases of handwritten hieroglyphs confirmed the effectiveness of the proposed approach by achieving comparable results to deep-learning methods, while can be implemented with conventional modern computers without accelerators. In addition, the method is fully interpretable, which is important for understanding and adjusting the recognition process, as well as for further development of the proposed approach.