Geo-Enabled Text Analytics through Sentiment Scoring and Hierarchical Clustering

The recent diverge in Data Science research studies towards Geoinformatics is well justified. Geographical dimensions are constantly captured by data collection engines in different shapes and forms. Data mining methods however, expect highly structured and organized datasets. Organizing data is especially challenging for geographical forms. Another critical form of data that is difficult to structure is text. Subsequently, the continuous rise in content through social media and the Internet of Things (IoT) has made Text Mining more relevant than ever before. Geographical data are mostly presented (if longitudes and latitudes data are not present) through text (Country, State, City). This paper is focused on mining through bodies of text that have a direct relevance to a geographical location, and extracting knowledge from that text for a better understanding of a certain chosen political/social/economical topic. In the novel method presented in this manuscript, geographical data is coupled with textual data to enable insights and correlations that are not possible otherwise. Additionally, the method mines for sentiments and assigns quantifiable sentiment scores. In geo-enabled mining of text, issues such as Emojis and stop-words (to, from, the, and) pose a technical challenge; the method presented in this paper addresses that as well. Besides the mentioned descriptive data insights (user's location, tweet test, age, language, gender, and number of tweets), this paper introduces a method that describes geographical patterns within text (using cosine distance measures and hierarchical clustering). Finally, experimental work is presented, results are recorded, evaluated, and used to define conclusions and future directions.