Crowdsourcing has become an important tool in areas such as business and marketing. It can help organizations solve large-scale problems in areas including traffic management and political campaigning. Toponym extraction is necessary when analyzing crowdsourced data for traffic tracking or event reporting. Dictionaries and rule-based analysis are commonly used for matching and extracting entities from text. However, the creation of an effective dictionary is not an easy task, especially when the goal is to name a large number of locations. Named Entity Recognition (NER) can help address this, but the approach has certain limitations. In this paper, we describe an improved approach to toponym extraction from Twitter messages that combines a dictionary and NER. As tweets are limited to 280 characters, any locations mentioned are usually referred to using abbreviations. The variety of forms that location names take, and the unstructured language of tweets, are challenging both to the dictionary and NER methods. We divided tweets into four categories to investigate the effect of analyzing messages from different domains. The average accuracy was 49.18% when using only the dictionary, 59.30% when using only NER, and 75.43% when using the hybrid method.
[1]
Choochart Haruechaiyasak,et al.
Traffic information extraction and classification from Thai Twitter
,
2016,
2016 13th International Joint Conference on Computer Science and Software Engineering (JCSSE).
[2]
Wasan Pattara-Atikom,et al.
Social-based traffic information extraction and classification
,
2011,
2011 11th International Conference on ITS Telecommunications.
[3]
Ponrudee Netisopakul,et al.
The State of Knowledge Extraction from Text for Thai Language
,
2017,
2017 6th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI).
[4]
Duangdao Wichadakul,et al.
2019 Thai General Election: A Twitter Analysis
,
2019,
SCDS.
[5]
Min Song,et al.
Developing a hybrid dictionary-based bio-entity recognition technique
,
2015,
BMC Medical Informatics and Decision Making.