Understanding the User Behavior of Foursquare: A Data-Driven Study on a Global Scale

Being a leading online service providing both local search and social networking functions, Foursquare has attracted tens of millions of users all over the world. Understanding the user behavior of Foursquare is helpful to gain insights for location-based social networks (LBSNs). Most of the existing studies focus on a biased subset of users, which cannot give a representative view of the global user base. Meanwhile, although the user-generated content (UGC) is very important to reflect user behavior, most of the existing UGC studies of Foursquare are based on the check-ins. There is a lack of a thorough study on tips, the primary type of UGC on Foursquare. In this article, by crawling and analyzing the global social graph and all published tips, we conduct the first comprehensive user behavior study of all 60+ million Foursquare users around the world. We have made the following three main contributions. First, we have found several unique and undiscovered features of the Foursquare social graph on a global scale, including a moderate level of reciprocity, a small average clustering coefficient, a giant strongly connected component, and a significant community structure. Besides the singletons, most of the Foursquare users are weakly connected with each other. Second, we undertake a thorough investigation according to all published tips on Foursquare. We start from counting the numbers of tips published by different users and then look into the tip contents from the perspectives of tip venues, temporal patterns, and sentiment. Our results provide an informative picture of the tip publishing patterns of Foursquare users. Last but not least, as a practical scenario to help third-party application providers, we propose a supervised machine learning-based approach to predict whether a user is an influential by referring to the profile and UGC, instead of relying on the social connectivity information. Our data-driven evaluation demonstrates that our approach can reach a good prediction performance with an F1-score of 0.87 and an AUC value of 0.88. Our findings provide a systematic view of the behavior of Foursquare users and are constructive for different relevant entities, including LBSN service providers, Internet service providers, and third-party application providers.

[1]  Cecilia Mascolo,et al.  Topological Properties and Temporal Dynamics of Place Networks in Urban Environments , 2015, WWW.

[2]  Arnaud Legout,et al.  Studying social networks at scale: macroscopic anatomy of the twitter social graph , 2014, SIGMETRICS '14.

[3]  Xin Wang,et al.  Detecting Malicious Accounts in Online Developer Communities Using Deep Learning , 2019, CIKM.

[4]  Le Zhang,et al.  Structure-Based Sybil Detection in Social Networks via Local Rule-Based Propagation , 2018, IEEE Transactions on Network Science and Engineering.

[5]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[6]  Gang Wang,et al.  On the validity of geosocial mobility traces , 2013, HotNets.

[7]  Xin Wang,et al.  DeepScan: Exploiting Deep Learning for Malicious Account Detection in Location-Based Social Networks , 2018, IEEE Communications Magazine.

[8]  Fengli Xu,et al.  Context-aware real-time population estimation for metropolis , 2016, UbiComp.

[9]  Amy Hicks,et al.  Why people use Yelp.com: An exploration of uses and gratifications , 2012, Comput. Hum. Behav..

[10]  Catherine Linard,et al.  Disaggregating Census Data for Population Mapping Using Random Forests with Remotely-Sensed and Ancillary Data , 2015, PloS one.

[11]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL 2006.

[12]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[13]  Xiaoming Fu,et al.  Crowd crawling: towards collaborative data collection for large-scale online social networks , 2013, COSN '13.

[14]  Xiaoming Fu,et al.  Understanding the behavioral differences between american and german users: A data-driven study , 2018, Big Data Min. Anal..

[15]  Philip S. Yu,et al.  Inferring anchor links across multiple heterogeneous social networks , 2013, CIKM.

[16]  Jiebo Luo,et al.  Tales of Two Cities: Using Social Media to Understand Idiosyncratic Lifestyles in Distinctive Metropolitan Areas , 2017, IEEE Transactions on Big Data.

[17]  Gang Wang,et al.  Wisdom in the social crowd: an analysis of quora , 2013, WWW.

[18]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[19]  Pan Hui,et al.  Understanding Cross-Site Linking in Online Social Networks , 2018, ACM Trans. Web.

[20]  A. Tatem,et al.  Dynamic population mapping using mobile phone data , 2014, Proceedings of the National Academy of Sciences.

[21]  Trevor Cohn,et al.  Mining user behaviours: a study of check-in patterns in location based social networks , 2013, WebSci.

[22]  Jin Zhao,et al.  Where are we visiting? Measurement and analysis of venues in Dianping , 2016, 2016 IEEE International Conference on Communications (ICC).

[23]  Jian Xu,et al.  Social network user influence sense-making and dynamics prediction , 2014, Expert Syst. Appl..

[24]  Jie Tang,et al.  Inferring social ties across heterogenous networks , 2012, WSDM '12.

[25]  Ben Y. Zhao,et al.  User interactions in social networks and their implications , 2009, EuroSys '09.

[26]  Jun Li,et al.  Optimizing Cost for Online Social Networks on Geo-Distributed Clouds , 2016, IEEE/ACM Transactions on Networking.

[27]  Michael Sirivianos,et al.  Aiding the Detection of Fake Accounts in Large Scale Social Online Services , 2012, NSDI.

[28]  Yun Chi,et al.  Identifying opinion leaders in the blogosphere , 2007, CIKM '07.

[29]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[30]  Ben Y. Zhao,et al.  Scaling Microblogging Services with Divergent Traffic Demands , 2011, Middleware.

[31]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[32]  Cecilia Mascolo,et al.  Socio-Spatial Properties of Online Location-Based Social Networks , 2011, ICWSM.

[33]  Santo Fortunato,et al.  Community detection in networks: Structural communities versus ground truth , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  Hui Xiong,et al.  An Influence Propagation View of PageRank , 2017, ACM Trans. Knowl. Discov. Data.

[35]  Pan Hui,et al.  Measurement and Analysis of the Swarm Social Network With Tens of Millions of Nodes , 2018, IEEE Access.

[36]  Gang Wang,et al.  The power of comments: fostering social interactions in microblog networks , 2016, Frontiers of Computer Science.

[37]  Rong Xie,et al.  We Know Your Preferences in New Cities: Mining and Modeling the Behavior of Travelers , 2018, IEEE Communications Magazine.

[38]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[39]  Jussara M. Almeida,et al.  Predicting the popularity of micro-reviews: A Foursquare case study , 2015, Inf. Sci..

[40]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[41]  Toyotaro Suzumura,et al.  How social network is evolving?: a preliminary study on billion-scale twitter network , 2013, WWW '13 Companion.

[42]  Muna S. Al-Razgan,et al.  Analyzing User Behaviors: A Study of Tips in Foursquare , 2018 .

[43]  Cecilia Mascolo,et al.  Distance Matters: Geo-social Metrics for Online Social Networks , 2010, WOSN.

[44]  Haewoon Kwak,et al.  Mining communities in networks: a solution for consistency and its evaluation , 2009, IMC '09.

[45]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[46]  Virgílio A. F. Almeida,et al.  Tips, dones and todos: uncovering user profiles in foursquare , 2012, WSDM '12.

[47]  Chao Xu,et al.  Exploring the power of social hub services , 2018, World Wide Web.

[48]  Michael Moricz,et al.  PYMK: friend recommendation at myspace , 2010, SIGMOD Conference.

[49]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[50]  Fabrício Benevenuto,et al.  Detecting tip spam in location-based social networks , 2013, SAC '13.

[51]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[52]  Michael Mathioudakis,et al.  Modeling Urban Behavior by Mining Geotagged Social Data , 2017, IEEE Transactions on Big Data.

[53]  Jussara M. Almeida,et al.  Revealing the City That We Cannot See , 2014, TOIT.

[54]  Yanghee Choi,et al.  Collecting, organizing, and sharing pins in pinterest: interest-driven or social-driven? , 2014, SIGMETRICS '14.

[55]  Eric Gilbert,et al.  VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[56]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[57]  Ben Y. Zhao,et al.  Multi-scale dynamics in a massive online social network , 2012, Internet Measurement Conference.

[58]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[59]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[60]  Argimiro Arratia,et al.  GeoSRS: A hybrid social recommender system for geolocated data , 2016, Inf. Syst..

[61]  Zhi-Li Zhang,et al.  Exploring venue popularity in Foursquare , 2013, 2013 Proceedings IEEE INFOCOM.

[62]  Stephanie Forrest,et al.  Email networks and the spread of computer viruses. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[63]  Pablo Rodriguez,et al.  Divide and Conquer: Partitioning Online Social Networks , 2009, ArXiv.

[64]  Athanasios V. Vasilakos,et al.  Understanding user behavior in online social networks: a survey , 2013, IEEE Communications Magazine.

[65]  Jianxi Fan,et al.  JPR: Exploring Joint Partitioning and Replication for Traffic Minimization in Online Social Networks , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[66]  Raymond Chi-Wing Wong,et al.  GeoLifecycle: User Engagement of Geographical Exploration and Churn Prediction in LBSNs , 2019, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[67]  Mohammed J. Zaki,et al.  ProfileRank: finding relevant content and influential users based on information diffusion , 2013, SNAKDD '13.