Real-Time Twitter Data Mining Approach to Infer User Perception Toward Active Mobility

This study evaluates the level of service of shared transportation facilities through mining geotagged data from social media and analyzing the perceptions of road users. An algorithm is developed adopting a text classification approach with contextual understanding to filter out relevant information related to users’ perceptions toward active mobility. Using a heuristic-based keyword matching approach produces about 75% tweets that are out of context, so that approach is deemed unsuitable for information extraction from Twitter. This study implements six different text classification models and compares the performance of these models for tweet classification. The model is applied to real-world data to filter out relevant information, and content analysis is performed to check the distribution of keywords within the filtered data. The text classification model “term frequency-inverse document frequency” vectorizer-based logistic regression model performed best at classifying the tweets. To select the best model, the performances of the models are compared based on precision, recall, F1 score (geometric mean of precision and recall), and accuracy metrics. The findings from the analysis show that the proposed method can help produce more relevant information on walking and biking facilities as well as safety concerns. By analyzing the sentiments of the filtered data, the existing condition of biking and walking facilities in the DC area can be inferred. This method can be a critical part of the decision support system to understand the qualitative level of service of existing transportation facilities.

[1]  Jing Gao,et al.  A deep learning approach for detecting traffic accidents from social media data , 2018, ArXiv.

[2]  Sonia Yeh,et al.  From individual to collective behaviours: exploring population heterogeneity of human mobility based on social media data , 2019, EPJ Data Science.

[3]  Dragomir R. Radev,et al.  Rumor has it: Identifying Misinformation in Microblogs , 2011, EMNLP.

[4]  Daniel Gatica-Perez,et al.  Discovering routines from large-scale human locations using probabilistic topic models , 2011, TIST.

[5]  Satish V. Ukkusuri,et al.  A novel transit rider satisfaction metric: Rider sentiments measured from online social media data , 2013 .

[6]  Tiejun Zhao,et al.  Target-dependent Twitter Sentiment Classification , 2011, ACL.

[7]  James H. Martin,et al.  CHAPTER 3 N-gram Language Models , 2020 .

[8]  T. Rashidi,et al.  Exploring the capacity of social media data for modelling travel behaviour: Opportunities and challenges , 2017 .

[9]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[10]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[11]  Debabrata Datta,et al.  Text Classification Using SVM Enhanced by Multithreading and CUDA , 2019, International Journal of Modern Education and Computer Science.

[12]  Samiul Hasan,et al.  A multilabel classification approach to identify hurricane‐induced infrastructure disruptions using social media data , 2020, Comput. Aided Civ. Infrastructure Eng..

[13]  Shehzad Khalid,et al.  News classification based on their headlines: A review , 2014, 17th IEEE International Multi Topic Conference 2014.

[14]  H. Mahmassani,et al.  Incorporating social media in travel and activity choice models: conceptual framework and exploratory analysis , 2018 .

[15]  Filippo Menczer,et al.  BotOrNot: A System to Evaluate Social Bots , 2016, WWW.

[16]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[17]  Gunwoo Lee,et al.  Insight from Scientific Study in Logistics using Text Mining , 2019, Transportation Research Record: Journal of the Transportation Research Board.

[18]  Michael Röder,et al.  Exploring the Space of Topic Coherence Measures , 2015, WSDM.

[19]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[20]  Thomas J. Watson,et al.  An empirical study of the naive Bayes classifier , 2001 .

[21]  Xiaofei Wang,et al.  A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[22]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[23]  Adel W. Sadek,et al.  Modeling the Impacts of Inclement Weather on Freeway Traffic Speed , 2015 .

[24]  Satish V. Ukkusuri,et al.  Urban activity pattern classification using topic models from online geo-location data , 2014 .

[25]  Sandeep Mudigonda,et al.  Incident Detection through Twitter: Organization vs. Personal Accounts , 2017 .

[26]  Sandeep Mudigonda,et al.  Incident Detection Through Twitter , 2017 .

[27]  Raúl Montoliu,et al.  Discovering Mobility Patterns on Bicycle-Based Public Transportation System by Using Probabilistic Topic Models , 2012, ISAmI.

[28]  Nirajan Shiwakoti,et al.  Social Media Use during Unplanned Transit Network Disruptions: A Review of Literature , 2014 .

[29]  Subasish Das,et al.  Extracting patterns from Twitter to promote biking , 2019, IATSS Research.

[30]  D. Edwards Data Mining: Concepts, Models, Methods, and Algorithms , 2003 .

[31]  Shen Zhang,et al.  Using Twitter to Enhance Traffic Incident Awareness , 2015, 2015 IEEE 18th International Conference on Intelligent Transportation Systems.

[32]  Yunming Ye,et al.  An Improved Random Forest Classifier for Text Categorization , 2012, J. Comput..

[33]  D. Ogilvie,et al.  Towards a differentiated understanding of active travel behaviour: Using social theory to explore everyday commuting , 2012, Social science & medicine.

[34]  Yafeng Yin,et al.  Discovering themes and trends in transportation research using topic modeling , 2017 .

[35]  Uyen Trang Nguyen,et al.  Twitter Bot Detection Using Bidirectional Long Short-Term Memory Neural Networks and Word Embeddings , 2019, 2019 First IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA).

[36]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[37]  Rishabh Kaushal,et al.  Towards automated real-time detection of misinformation on Twitter , 2016, 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[38]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[39]  James G. Shanahan,et al.  Improving SVM Text Classification Performance through Threshold Adjustment , 2003, ECML.

[40]  Dan Klein,et al.  Faster and Smaller N-Gram Language Models , 2011, ACL.

[41]  Björn W. Schuller,et al.  Contextual Bidirectional Long Short-Term Memory Recurrent Neural Network Language Models: A Generative Approach to Sentiment Analysis , 2017, EACL.

[42]  David Rees,et al.  The Societal Costs and Benefits of Commuter Bicycling: Simulating the Effects of Specific Policies Using System Dynamics Modeling , 2014, Environmental health perspectives.

[43]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[44]  Donald E. Brown,et al.  Text Classification Algorithms: A Survey , 2019, Inf..

[45]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[46]  Xiaohui Song,et al.  Improved Bayes Method Based on TF-IDF Feature and Grade Factor Feature for Chinese Information Classification , 2018, 2018 IEEE International Conference on Big Data and Smart Computing (BigComp).

[47]  Zhiyuan Liu,et al.  A C-LSTM Neural Network for Text Classification , 2015, ArXiv.

[48]  Mohamed Abdel-Aty,et al.  Sharing Real-Time Traffic Information With Travelers Using Twitter: An Analysis of Effectiveness and Information Content , 2019, Front. Built Environ..

[49]  Sebastian Raschka,et al.  Python Machine Learning , 2015 .

[50]  Peng Zhou,et al.  Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling , 2016, COLING.

[51]  Eric Gilbert,et al.  VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[52]  Lai Tu,et al.  Congestion Avoidance Routing Based on Large-Scale Social Signals , 2016, IEEE Transactions on Intelligent Transportation Systems.

[53]  A. Bauman,et al.  Physical activity from walking and cycling for daily travel in the United States, 2001–2017: Demographic, socioeconomic, and geographic variation , 2020, Journal of Transport & Health.

[54]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[55]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .