Key Spatial Relations-based Focused Crawling (KSRs-FC) for Borderlands Situation Analysis
Keywords: Focused Crawling, Place Names, Web Information Collection, Borderlands Situation Analysis, Relevance Calculation, Spatial Relations
Abstract. Place names play an important role in Borderlands Situation topics, while current focused crawling methods treat them in the same way as other common keywords, which may lead to the omission of many useful web pages. In the paper, place names in web pages and their spatial relations were firstly discussed. Then, a focused crawling method named KSRs-FC was proposed to deal with the collection of situation information about borderlands. In this method, place names and common keywords were represented separately, and some of the spatial relations related to web pages crawling were used in the relevance calculation between the given topic and web pages. Furthermore, an information collection system for borderlands situation analysis was developed based on KSRs-FC. Finally, F-Score method was adopted to quantitatively evaluate this method by comparing with traditional method. Experimental results showed that the F-Score value of the proposed method increased by 11% compared to traditional method with the same sample data. Obviously, KSRs-FC method can effectively reduce the misjudgement of relevant webpages.