The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Publications Copernicus
Articles | Volume XL-4/W3
13 Nov 2013
 | 13 Nov 2013

Using Web Crawler Technology for Text Analysis of Geo-Events: A Case Study of the Huangyan Island Incident

H. Hu and Y. J. Ge

Keywords: Web crawler technology; text information; sentiment analysis; Huangyan Island incident

Abstract. With the social networking and network socialisation have brought more text information and social relationships into our daily lives, the question of whether big data can be fully used to study the phenomenon and discipline of natural sciences has prompted many specialists and scholars to innovate their research. Though politics were integrally involved in the hyperlinked word issues since 1990s, automatic assembly of different geospatial web and distributed geospatial information systems utilizing service chaining have explored and built recently, the information collection and data visualisation of geo-events have always faced the bottleneck of traditional manual analysis because of the sensibility, complexity, relativity, timeliness and unexpected characteristics of political events. Based on the framework of Heritrix and the analysis of web-based text, word frequency, sentiment tendency and dissemination path of the Huangyan Island incident is studied here by combining web crawler technology and the text analysis method. The results indicate that tag cloud, frequency map, attitudes pie, individual mention ratios and dissemination flow graph based on the data collection and processing not only highlight the subject and theme vocabularies of related topics but also certain issues and problems behind it. Being able to express the time-space relationship of text information and to disseminate the information regarding geo-events, the text analysis of network information based on focused web crawler technology can be a tool for understanding the formation and diffusion of web-based public opinions in political events.