TOPIC MODELING AND ASSOCIATION RULE MINING TO DISCOVER GEOSPATIAL SEMANTIC INFORMATION FROM UNSTRUCTURED DATA SOURCES
Keywords: Geospatial Knowledge, Semantic Information, Unstructured Data, Topic Modeling, Association Rules, Latent Dirichlet Allocation, FP-Growth
Abstract. As the amount of semi-structured and unstructured information sources expands at an exponential rate, there is a growing demand for semantic information elicitation of the immanent knowledge included in these sources. Semantic information elicitation processes such as semantic information extraction, linking, and annotation aim to make the knowledge explicit and unveil aspects latent in these sources to support knowledge discovery, semantic analysis, and visualization. The paper describes the implementation of Latent Dirichlet Allocation (LDA) topic modeling and association rule mining with FP-Growth for knowledge discovery. RapidMiner, an open-source data mining software is used for the objectives of this work.