RECOGNIZING REFERENCES TO PHYSICAL PLACES INSIDE SHORT TEXTS BY USING PATTERNS AS A SEQUENCE OF GRAMMATICAL CATEGORIES
Keywords: Short messages, patterns, GSP, grammatical categories, location identification, crowdsourcing, reports
Abstract. Collecting data by crowdsourcing is an explored trend to support database population and update. This kind of data is unstructured and comes from text, in particular text in social networks. Geographic database is a particular case of database that can be populated by crowdsourcing which can be done when people report some urban event in a social network by writing a short message. An event can describe an accident or a non-functioning device in the urban area. The authorities then need to read and to interpret the message to provide some help for injured people or to fix a problem in a device installed in the urban area like a light or a problem on road. Our main interest is located on working with short messages organized in a collection. Most of the messages do not have geographical coordinates. The messages can then be classified by text patterns describing a location. In fact, people use a text pattern to describe an urban location. Our work tries to identify patterns inside a short text and to indicate when it describes a location. When a pattern is identified our approach look to describe the place where the event is located. The source messages used are tweets reporting events from several Mexican cities.