Event-Oriented Map Extraction From Web News Portal : Binary Map Case Study on Diphteria Outbreak and Flood in Jakarta

The abundance of online news texts which contain embedded geographical name references from the internet provide motivation to produce higher level analysis in the form of thematic maps. This can be done by a performing automated geospatial information extraction and retrieval from relevant event-oriented corpora which mainly existed in natural language form. However, unified methods and framework available to address this transformation is still lacking. We propose the incorporation of unsupervised topic modeling and word embedding to help accomplishing the task of aggregating georeferenced data. The topic modeling tool would help suggesting the positive keywords and negative keywords for particular topic while the word embedding helped improve the recall score by extending the semanticaly similar keywords. The method was tested on Indonesian news corpus and achieved comparable result on two offical binary thematic maps case studies based on flood event in Jakarta and diphteria disease in Indonesia.