Enhancing transport data collection through social media sources: methods, challenges and opportunities for textual data

Social media data now enriches and supplements information flow in various sectors of society. The question addressed here is whether social media can act as a credible information source of sufficient quality to meet the needs of transport planners, operators, policy makers and the travelling public. A typology of primary transport data needs, current and new data sources is initially established, following which this study focuses on social media textual data in particular. Three sub-questions are investigated: the potential to use social media data alongside existing transport data, the technical challenges in extracting transport-relevant information from social media and the wider barriers to the uptake of this data. Following an overview of the text mining process to extract relevant information from the corpus, a review of the challenges this approach holds for the transport sector is given. These include ontologies, sentiment analysis, location names and measuring accuracy. Finally, institutional issues in the greater use of social media are highlighted, concluding that social media information has not yet been fully explored. The contribution of this study is in scoping the technical challenges in mining social media data within the transport context, laying the foundation for further research in this field.

[1]  Axel Schulz,et al.  I See a Car Crash: Real-Time Detection of Small Scale Incidents in Microblogs , 2013, ESWC.

[2]  Vishal Gupta,et al.  A Survey on Sentiment Analysis and Opinion Mining Techniques , 2013 .

[3]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[4]  Silvio Nocera,et al.  A heuristic method for determining CO2 efficiency in transportation planning , 2012 .

[5]  Lawrence Birnbaum,et al.  Reasoning Through Search: A Novel Approach to Sentiment Classification , 2007 .

[6]  Charu C. Aggarwal,et al.  Mining Text Data , 2012, Springer US.

[7]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[8]  Silvio Nocera,et al.  The Key Role of Quality Assessment in Public Transport Policy , 2011 .

[9]  Aron Culotta,et al.  Inferring the origin locations of tweets with quantitative confidence , 2013, CSCW.

[10]  Davy Janssens,et al.  Ambient Systems , Networks and Technologies ( ANT 2013 ) An Activity-based Carpooling Microsimulation using Ontology , 2013 .

[11]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[12]  Amit Srivastava,et al.  Content-Based GeoLocation Detection for Placing Tweets Pertaining To Trending News on Map , 2013 .

[13]  Nirajan Shiwakoti,et al.  Social media use in unplanned passenger rail disruptions - an international study , 2014 .

[14]  Rob Hranac,et al.  Twitter Interactions as a Data Source for Transportation Incidents , 2013 .

[15]  Walter Musakwa The use of social media in public transit systems: The case of the Gautrain, Gauteng Province, South Africa: Analysis and lessons learnt , 2014 .

[16]  Silvio Nocera,et al.  Policy Effectiveness for Containing CO2 Emissions in Transportation , 2011 .

[17]  Shaowen Wang,et al.  Mapping the global Twitter heartbeat: The geography of Twitter , 2013, First Monday.

[18]  Changjun Jiang,et al.  An Ontology-based Public Transport Query System , 2005, 2005 First International Conference on Semantics, Knowledge and Grid.

[19]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[20]  Zhanmin Zhang Analyzing the Use of Facebook Page Among State , 2013 .

[21]  Káthia Marçal de Oliveira,et al.  A public transportation ontology to support user travel planning , 2010, 2010 Fourth International Conference on Research Challenges in Information Science (RCIS).

[22]  Abolghasem Sadeghi-Niaraki,et al.  Ontology based personalized route planning system using a multi-criteria decision making approach , 2009, Expert Syst. Appl..

[23]  Tao Wang,et al.  The Fusion Model of Intelligent Transportation Systems Based on the Urban Traffic Ontology , 2012 .

[24]  Armin R. Mikler,et al.  Text and Structural Data Mining of Influenza Mentions in Web and Social Media , 2010, International journal of environmental research and public health.

[25]  Eni Mustafaraj,et al.  Can Collective Sentiment Expressed on Twitter Predict Political Elections? , 2011, AAAI.

[26]  Luis Miguel Romero Pérez,et al.  Traffic Flow Estimation Models Using Cellular Phone Data , 2012, IEEE Transactions on Intelligent Transportation Systems.

[27]  Werner Antweiler,et al.  Is All that Talk Just Noise? The Information Content of Internet Stock Message Boards , 2001 .

[28]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[29]  Silvio Nocera,et al.  A Joint Probability Density Function for Reducing the Uncertainty of Marginal Social Cost of Carbon Evaluation in Transport Planning , 2014 .

[30]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[31]  Anders Hjalmarsson,et al.  Move better with tripzoom , 2012 .

[32]  Janyce Wiebe,et al.  Articles: Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis , 2009, CL.

[33]  Tsvi Kuflik,et al.  The potential of social media in delivering transport policy goals , 2014 .

[34]  Eric J Miller,et al.  Collecting Social Network Data to Study Social Activity-Travel Behavior: An Egocentric Approach , 2008 .

[35]  Silvio Nocera,et al.  A methodological framework for the economic evaluation of CO2 emissions from transport , 2014 .

[36]  Silvio Nocera,et al.  Economic Evaluation of Future Carbon Dioxide Impacts from Italian Highways , 2012 .

[37]  Abdelilah Maach,et al.  ONTOLOGY-BASED CONTEXT MODELING FOR VEHICLE CONTEXT-AWARE SERVICES , 2011 .

[38]  Mike Rosner,et al.  A Geospatial World Model for the Semantic Web , 2005, PPSWR.

[39]  James Purnama,et al.  Traffic Condition Information Extraction & Visualization from Social Media Twitter for Android Mobile Application , 2011, Proceedings of the 2011 International Conference on Electrical Engineering and Informatics.

[40]  Dimitrios Efthymiou,et al.  Use of Social Media for Transport Data Collection , 2012 .

[41]  Gisele L. Pappa,et al.  Inferring the Location of Twitter Messages Based on User Relationships , 2011, Trans. GIS.

[42]  Satish V. Ukkusuri,et al.  A novel transit rider satisfaction metric: Rider sentiments measured from online social media data , 2013 .

[43]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[44]  Susan Grant-Muller,et al.  The Impact of Social Media Usage on Transport Policy: Issues, Challenges and Recommendations☆ , 2014 .

[45]  Silvio Nocera,et al.  Transportation elasticity for the analysis of Italian transportation demand on a regional scale , 2008 .

[46]  A. Kaplan,et al.  Users of the world, unite! The challenges and opportunities of Social Media , 2010 .

[47]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[48]  Bu-Sung Lee,et al.  TwiNER: named entity recognition in targeted twitter stream , 2012, SIGIR '12.

[49]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[50]  Charles V. Trappey,et al.  Knowledge discovery of customer satisfaction and dissatisfaction using ontology-based text analysis of critical incident dialogues , 2012, Proceedings of the 2012 IEEE 16th International Conference on Computer Supported Cooperative Work in Design (CSCWD).

[51]  Ralph Grishman,et al.  Information extraction for enhanced access to disease outbreak reports , 2002, J. Biomed. Informatics.