Identifying tourists and analyzing spatial patterns of their destinations from location-based social media data

Abstract Reliable travel behavior data is a prerequisite for transportation planning process. In large tourism dependent cities, tourists are the most dynamic population group whose size and travel choices remain unknown to planners. Traditional travel surveys generally observe resident travel behavior and rarely target tourists. Ubiquitous uses of social media platforms in smartphones have created a tremendous opportunity to gather digital traces of tourists at a large scale. In this paper, we present a framework on how to use location-based data from social media to gather and analyze travel behavior of tourists. We have collected data of about 67,000 users from Twitter using its search interface for Florida. We first propose several filtering steps to create a reliable sample from the collected Twitter data. An ensemble classification technique is proposed to classify tourists and residents from user coordinates. The accuracy of the proposed classifier has been compared against the state-of-the-art classification methods. Finally, different clustering methods have been used to find the spatial patterns of destination choices of tourists. Promising results have been found from the output clusters as they reveal most popular tourist spots as well as some of the emerging tourist attractions in Florida. Performance of the proposed clustering techniques has been assessed using internal clustering validation indices. We have analyzed temporal patterns of tourist and resident activities to validate the classification of the users in two separate groups of tourists and residents. Proposed filtering, identification, and clustering techniques will be significantly useful for building individual-level tourist travel demand models from social media data.

[1]  Satish V. Ukkusuri,et al.  Analysis of social interaction network properties and growth on Twitter , 2018, Social Network Analysis and Mining.

[2]  Satish V. Ukkusuri,et al.  Urban activity pattern classification using topic models from online geo-location data , 2014 .

[3]  Samuel C. Woolley,et al.  Automating power: Social bot interference in global politics , 2016, First Monday.

[4]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Satish V. Ukkusuri,et al.  Reconstructing Activity Location Sequences From Incomplete Check-In Data: A Semi-Markov Continuous-Time Bayesian Network Model , 2018, IEEE Transactions on Intelligent Transportation Systems.

[7]  J. Nelson,et al.  Tweeting Transit: An examination of social media strategies for transport information management during a large event ☆ , 2017 .

[8]  Hendrik,et al.  Trip Guidance: A Linked Data Based Mobile Tourists Guide , 2014 .

[9]  Geoffrey J. McLachlan,et al.  Analyzing Microarray Gene Expression Data , 2004 .

[10]  Ling Chen,et al.  A context-aware personalized travel recommendation system based on geotagged social media data mining , 2013, Int. J. Geogr. Inf. Sci..

[11]  Akshay Vij,et al.  When is big data big enough? Implications of using GPS-based surveys for travel demand analysis , 2015 .

[12]  Kristina Lerman,et al.  Travel analytics: Understanding how destination choice and business clusters are connected based on social media data ☆ , 2017 .

[13]  Przemyslaw Kazienko,et al.  Heuristic Classifier Chains for Multi-label Classification , 2013, FQAS.

[14]  Alina A. von Davier,et al.  Cross-Validation , 2014 .

[15]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[16]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[17]  Lun Wu,et al.  Intra-Urban Human Mobility and Activity Transition: Evidence from Social Media Check-In Data , 2014, PloS one.

[18]  Lior Rokach,et al.  Pattern Classification Using Ensemble Methods , 2009, Series in Machine Perception and Artificial Intelligence.

[19]  Alexander Zipf,et al.  Mining and correlating traffic events from human sensor observations with official transport data using self-organizing-maps , 2016 .

[20]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[21]  Eleonora D'Andrea,et al.  Real-Time Detection of Traffic From Twitter Stream Analysis , 2015, IEEE Transactions on Intelligent Transportation Systems.

[22]  Peter J. Jin,et al.  An adaptive hawkes process formulation for estimating time-of-day zonal trip arrivals with location-based social networking check-in data , 2017 .

[23]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[24]  Purnima Bholowalia,et al.  EBK-Means: A Clustering Technique based on Elbow Method and K-Means in WSN , 2014 .

[25]  Zhenhua Zhang,et al.  Exploratory Study on Correlation Between Twitter Concentration and Traffic Surges , 2016 .

[26]  M. Savic,et al.  An automatic language identification system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[27]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Dieter Fox,et al.  Location-Based Activity Recognition , 2005, KI.

[29]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[30]  Shanjiang Zhu,et al.  Potentials of using social media to infer the longitudinal travel behavior: A sequential model-based clustering method , 2017 .

[31]  T. Rashidi,et al.  Exploring the capacity of social media data for modelling travel behaviour: Opportunities and challenges , 2017 .

[32]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[33]  Jonathan Cheung-Wai Chan,et al.  Evaluation of random forest and adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery , 2008 .

[34]  Thomas Ertl,et al.  Thematic Patterns in Georeferenced Tweets through Space-Time Visual Analytics , 2013, Computing in Science & Engineering.

[35]  Satish V. Ukkusuri,et al.  Location Contexts of User Check-Ins to Model Urban Geo Life-Style Patterns , 2015, PloS one.

[36]  Tsvi Kuflik,et al.  Automating a framework to extract and analyse transport related social media content: The potential and the challenges , 2017 .

[37]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Víctor Soto,et al.  Characterizing Urban Landscapes Using Geolocated Tweets , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[39]  Ludovico Boratto,et al.  Using social media to characterize urban mobility patterns: State-of-the-art survey and case-study , 2017, Online Soc. Networks Media.

[40]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[41]  Antony Stathopoulos,et al.  A utility-maximization model for retrieving users’ willingness to travel for participating in activities from big-data , 2015 .

[42]  Xing Xie,et al.  Mining Individual Life Pattern Based on Location History , 2009, 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware.

[43]  Qing He,et al.  Forecasting the Subway Passenger Flow Under Event Occurrences With Social Media , 2017, IEEE Transactions on Intelligent Transportation Systems.

[44]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[45]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[46]  Satish V. Ukkusuri,et al.  Understanding Social Influence in Activity Location Choice and Lifestyle Patterns Using Geolocation Data from Social Media , 2016, Front. ICT.

[47]  Eleni I. Vlahogianni,et al.  Big data in transportation and traffic engineering , 2015 .

[48]  Feng Chen,et al.  From Twitter to detector: real-time traffic incident detection using social media data , 2016 .

[49]  Konstadinos G. Goulias,et al.  Activity space estimation with longitudinal observations of social media data , 2016, Transportation.

[50]  Kyumin Lee,et al.  Exploring Millions of Footprints in Location Sharing Services , 2011, ICWSM.

[51]  Jing Gao,et al.  A deep learning approach for detecting traffic accidents from social media data , 2018, ArXiv.

[52]  Xing Xie,et al.  Collaborative activity recognition via check-in history , 2011, LBSN '11.

[53]  Ji-Hyun Kim,et al.  Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap , 2009, Comput. Stat. Data Anal..

[54]  Krzysztof Janowicz,et al.  Can Twitter data be used to validate travel demand models , 2015 .

[55]  H. Mahmassani,et al.  Incorporating social media in travel and activity choice models: conceptual framework and exploratory analysis , 2018 .

[56]  Robert Weibel,et al.  Travelers or locals? Identifying meaningful sub-populations from human movement data in the absence of ground truth , 2018, EPJ Data Science.

[57]  Yong Gao,et al.  Uncovering Patterns of Inter-Urban Trip and Spatial Interaction from Social Media Check-In Data , 2013, PloS one.