Comparing Methods to Collect and Geolocate Tweets in Great Britain

In the era of Big Data, the Internet has become one of the main data sources: Data can be collected for relatively low costs and can be used for a wide range of purposes. To be able to timely support solid decisions in any field, it is essential to increase data production efficiency, data accuracy, and reliability. In this framework, our paper aims at identifying an optimized and flexible method to collect and, at the same time, geolocate social media information over a whole country. In particular, the target of this paper is to compare three alternative methods to collect data from the social media Twitter. This is achieved considering four main comparison criteria: Collection time, dataset size, pre-processing phase load, and geographic distribution. Our findings regarding Great Britain identify one of these methods as the best option, since it is able to collect both the highest number of tweets per hour and the highest percentage of unique tweets per hour. Furthermore, this method reduces the computational effort needed to pre-process the collected tweets (e.g., showing the lowest collection times and the lowest number of duplicates within the geographical areas) and enhances the territorial coverage (if compared to the population distribution). At the same time, the effort required to set up this method is feasible and less prone to the arbitrary decisions of the researcher.

[1]  Rafael Valencia-García,et al.  Review of English literature on figurative language applied to social networks , 2019, Knowledge and Information Systems.

[2]  Jason I. Hong,et al.  State of the Geotags: Motivations and Recent Changes , 2017, ICWSM.

[3]  J. West,et al.  Contrasting Innovation Creation and Commercialization within Open, User and Cumulative Innovation , 2010 .

[4]  Philipp Maier,et al.  A global village without borders? international price differentials at eBay , 2005 .

[5]  V. Acha OPEN BY DESIGN: THE ROLE OF DESIGN IN OPEN INNOVATION. , 2008 .

[6]  Claire Hewson Conducting research on the internet - a new era , 2014 .

[7]  Axel Bruns,et al.  Tools and methods for capturing Twitter data during natural disasters , 2012, First Monday.

[8]  Sune Lehmann,et al.  Understanding the Demographics of Twitter Users , 2011, ICWSM.

[9]  Siwar Jendoubi,et al.  Evidential positive opinion influence measures for viral marketing , 2019, Knowledge and Information Systems.

[10]  Halit Oguztüzün,et al.  A survey on location estimation techniques for events detected in Twitter , 2017, Knowledge and Information Systems.

[11]  Ed H. Chi,et al.  Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles , 2011, CHI.

[12]  J. Yun,et al.  The Effect of Open Innovation on Technology Value and Technology Transfer: A Comparative Analysis of the Automotive, Robotics, and Aviation Industries of Korea , 2018, Sustainability.

[13]  Balachander Krishnamurthy,et al.  A few chirps about twitter , 2008, WOSN '08.

[14]  H. Chesbrough Open Business Models: How to Thrive in the New Innovation Landscape , 2006 .

[15]  Judith Hillen,et al.  Web scraping for food price research , 2019, British Food Journal.

[16]  Weitong Chen,et al.  A survey of sentiment analysis in social media , 2018, Knowledge and Information Systems.

[17]  Andreas Mild,et al.  A critical review of empirical research on open innovation adoption , 2012 .

[18]  O. Gassmann,et al.  Open R&D and Open Innovation: Exploring the Phenomenon , 2009 .

[19]  Mark Dredze,et al.  Quantifying Mental Health Signals in Twitter , 2014, CLPsych@ACL.

[20]  J. Yun,et al.  Analysing and simulating the effects of open innovation policies: Application of the results to Cambodia , 2015 .

[21]  Leysia Palen,et al.  Microblogging during two natural hazards events: what twitter may contribute to situational awareness , 2010, CHI.

[22]  Tsvi Kuflik,et al.  Enhancing transport data collection through social media sources: methods, challenges and opportunities for textual data , 2015 .

[23]  Melchior D Jolink,et al.  Open Business Models: How to Thrive in the New Innovation Landscape , 2014 .

[24]  Anthony Stefanidis,et al.  Triangulating Social Multimedia Content for Event Localization using Flickr and Twitter , 2015, Trans. GIS.

[25]  Xiaojun Wang,et al.  A text analytics approach for online retailing service improvement: Evidence from Twitter , 2019, Decis. Support Syst..

[26]  Yiannis Kompatsiaris,et al.  Location Extraction from Social Media , 2018, ACM Trans. Inf. Syst..

[27]  Michael D. Barnes,et al.  Tracking suicide risk factors through Twitter in the US. , 2014, Crisis.

[28]  Timothy Baldwin,et al.  Text-Based Twitter User Geolocation Prediction , 2014, J. Artif. Intell. Res..

[29]  Daniele Toninelli,et al.  Looking for Efficient Methods to Collect and Geolocalise Tweets , 2019 .

[30]  J. Yun,et al.  Sustainability Condition of Open Innovation: Dynamic Growth of Alibaba from SME to Large Enterprise , 2020, Sustainability.

[31]  Ferran Plà,et al.  Language identification of multilingual posts from Twitter: a case study , 2017, Knowledge and Information Systems.

[32]  Desheng Dash Wu,et al.  Disaster early warning and damage assessment analysis using social media data and geo-location information , 2018, Decis. Support Syst..

[33]  X. Ginesta,et al.  Social-Media Analysis for Disaster Prevention: Forest Fire in Artenara and Valleseco, Canary Islands , 2020, Journal of Open Innovation: Technology, Market, and Complexity.

[34]  Ramón Compañó,et al.  Grasping the potential of online social networks for foresight , 2007 .

[35]  Aron Culotta,et al.  Inferring the origin locations of tweets with quantitative confidence , 2013, CSCW.

[36]  Santosh Pandey,et al.  Cloud Based Web Scraping for Big Data Applications , 2017, 2017 IEEE International Conference on Smart Cloud (SmartCloud).

[37]  Anja Bechmann,et al.  Using APIs for Data Collection on Social Media , 2014, Inf. Soc..

[38]  Fehmi Ben Abdesslem,et al.  Reliable Online Social Network Data Collection , 2012, Computational Social Networks.

[39]  A. Bruns,et al.  #Ausvotes: How twitter covered the 2010 Australian federal election , 2011 .

[40]  Chen Xu,et al.  Tracing the Spatial-Temporal Evolution of Events Based on Social Media Data , 2017, ISPRS Int. J. Geo Inf..

[41]  Natalia Beloff,et al.  Rise of Big Data – Issues and Challenges , 2018, 2018 21st Saudi Computer Society National Computer Conference (NCC).

[42]  Brenden Jongman,et al.  TAGGS: Grouping Tweets to Improve Global Geotagging for Disaster Response , 2017 .

[43]  J. Yun,et al.  The Culture for Open Innovation Dynamics , 2020, Sustainability.

[44]  Jie Yin,et al.  Location extraction from disaster-related microblogs , 2013, WWW.

[45]  Huan Liu,et al.  Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose , 2013, ICWSM.

[46]  Ravikiran Vatrapu,et al.  Predicting iPhone Sales from iPhone Tweets , 2014, 2014 IEEE 18th International Enterprise Distributed Object Computing Conference.

[47]  Yeran Sun,et al.  On fine-grained geolocalisation of tweets and real-time traffic incident detection , 2019, Inf. Process. Manag..

[48]  Benyuan Liu,et al.  Predicting Flu Trends using Twitter data , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[49]  S. Iacus,et al.  Using Sentiment Analysis to Monitor Electoral Campaigns , 2015 .

[50]  Wingyan Chung,et al.  BizPro: Extracting and categorizing business intelligence factors from textual news articles , 2014, Int. J. Inf. Manag..

[51]  Paulo Cortez,et al.  Twitter user geolocation using web country noun searches , 2019, Decis. Support Syst..

[52]  Aixin Sun,et al.  A Survey of Location Prediction on Twitter , 2017, IEEE Transactions on Knowledge and Data Engineering.

[53]  Panagiotis Takis Metaxas,et al.  Limits of Electoral Predictions Using Twitter , 2011, ICWSM.

[54]  E. Huizingh Open innovation: State of the art and future perspectives , 2011 .

[55]  Jonathan J. H. Zhu,et al.  Big Data, Collection of (Social Media, Harvesting) , 2017 .

[56]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[57]  Weiru Liu,et al.  A survey of location inference techniques on Twitter , 2015, J. Inf. Sci..

[58]  Xuesong Lu,et al.  Reliability of Data Collection Methods in Social Media Research , 2015, ICWSM.