SMA4TD: A social media analysis methodology for trajectory discovery in large-scale events

Abstract The widespread use of social media platforms allows scientists to collect huge amount of data posted by people interested in a given topic or event. This data can be analyzed to infer patterns and trends about people behaviors related to a topic or an event on a very large scale. Social media posts are often tagged with geographical coordinates or other information that allows identifying user positions, this way enabling mobility pattern analysis using trajectory mining techniques. This paper describes SMA4TD, a methodology for discovering behavior and mobility patterns of users attending large-scale public events, by analyzing social media posts. The methodology is demonstrated through two case studies. The first one is an analysis of geotagged tweets for learning the behavior of people attending the 2014 FIFA World Cup. The second one is a mobility pattern analysis on the Instagram users who visited EXPO 2015. In both cases, a very high correlation (Pearson coefficient 0.7–0.9) was measured between official attendee numbers and those produced by our analysis. This result shows the effectiveness of the proposed methodology and confirms its accuracy.

[1]  Domenico Talia,et al.  G-RoI , 2018, ACM Trans. Knowl. Discov. Data.

[2]  Yu Zheng,et al.  Trajectory Data Mining , 2015, ACM Trans. Intell. Syst. Technol..

[3]  Eugenio Cesario,et al.  Following soccer fans from geotagged tweets at FIFA World Cup 2014 , 2015, 2015 2nd IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM).

[4]  Edward R. Tufte,et al.  The Visual Display of Quantitative Information , 1986 .

[5]  Eugenio Cesario,et al.  Trajectory Pattern Mining for Urban Computing in the Cloud , 2017, IEEE Transactions on Parallel and Distributed Systems.

[6]  Cecilia Mascolo,et al.  Geo-spotting: mining online location-based services for optimal retail store placement , 2013, KDD.

[7]  John Maeda,et al.  The laws of simplicity , 2006, Design, technology, business, life.

[8]  Maurice van Keulen,et al.  Point of interest to region of interest conversion , 2013, SIGSPATIAL/GIS.

[9]  Norman M. Sadeh,et al.  The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City , 2012, ICWSM.

[10]  Domenico Talia,et al.  JS4Cloud: script‐based workflow programming for scalable data analysis on cloud platforms , 2015, Concurr. Comput. Pract. Exp..

[11]  Domenico Talia,et al.  Data Analysis in the Cloud , 2015 .

[12]  Ickjai Lee,et al.  Sequential pattern mining of geo-tagged photos with an arbitrary regions-of-interest detection method , 2014, Expert Syst. Appl..

[13]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Dino Pedreschi,et al.  Trajectory pattern mining , 2007, KDD '07.

[15]  Jiebo Luo,et al.  Diversified Trajectory Pattern Ranking in Geo-tagged Social Media , 2011, SDM.

[16]  Shaowen Wang,et al.  A scalable framework for spatiotemporal analysis of location-based social media data , 2014, Comput. Environ. Urban Syst..

[17]  Edward R. Tufte,et al.  Envisioning Information , 1990 .

[18]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[19]  Cecilia Mascolo,et al.  Where Businesses Thrive: Predicting the Impact of the Olympic Games on Local Retailers through Location-based Services Data , 2014, ICWSM.

[20]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[21]  Eugenio Cesario,et al.  Analyzing social media data to discover mobility patterns at EXPO 2015: Methodology and results , 2016, 2016 International Conference on High Performance Computing & Simulation (HPCS).