论文信息 - Geo-Enabled Text Analytics through Sentiment Scoring and Hierarchical Clustering

Geo-Enabled Text Analytics through Sentiment Scoring and Hierarchical Clustering

The recent diverge in Data Science research studies towards Geoinformatics is well justified. Geographical dimensions are constantly captured by data collection engines in different shapes and forms. Data mining methods however, expect highly structured and organized datasets. Organizing data is especially challenging for geographical forms. Another critical form of data that is difficult to structure is text. Subsequently, the continuous rise in content through social media and the Internet of Things (IoT) has made Text Mining more relevant than ever before. Geographical data are mostly presented (if longitudes and latitudes data are not present) through text (Country, State, City). This paper is focused on mining through bodies of text that have a direct relevance to a geographical location, and extracting knowledge from that text for a better understanding of a certain chosen political/social/economical topic. In the novel method presented in this manuscript, geographical data is coupled with textual data to enable insights and correlations that are not possible otherwise. Additionally, the method mines for sentiments and assigns quantifiable sentiment scores. In geo-enabled mining of text, issues such as Emojis and stop-words (to, from, the, and) pose a technical challenge; the method presented in this paper addresses that as well. Besides the mentioned descriptive data insights (user's location, tweet test, age, language, gender, and number of tweets), this paper introduces a method that describes geographical patterns within text (using cosine distance measures and hierarchical clustering). Finally, experimental work is presented, results are recorded, evaluated, and used to define conclusions and future directions.

Ruixin Yang | Gerald Gendron | Feras A. Batarseh | Gayatri Nambiar

[1] Serkan Ayvaz,et al. Sentiment analysis on Twitter: A text mining approach to the Syrian refugee crisis , 2018, Telematics Informatics.

[2] Kyumin Lee,et al. A content-driven framework for geolocating microblog users , 2013, TIST.

[3] Tong Zhang,et al. Text Mining: Predictive Methods for Analyzing Unstructured Information , 2004 .

[4] Guangchao Yuan. Investigating Sentiment, Homophily, and Location for Understanding User Interactions in Social Media. , 2016 .

[5] Ahreum Hong,et al. Does social media use really make people politically polarized? Direct and indirect effects of social media use on political polarization in South Korea , 2018, Telematics Informatics.

[6] Krys J. Kochut,et al. A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques , 2017, ArXiv.

[7] John D. Wilkerson,et al. Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges , 2017 .

[8] Yu-Qian Zhu,et al. Would you change your mind? An empirical study of social impact theory on Facebook , 2018, Telematics Informatics.

[9] Jessica Lin,et al. Data Mining for Geoinformatics , 2013, Springer New York.

[10] S. Brindha,et al. A survey on classification techniques for text mining , 2016, 2016 3rd International Conference on Advanced Computing and Communication Systems (ICACCS).