WhatsApp usage patterns and prediction of demographic characteristics without access to message content

BACKGROUND Social networks on the Internet have become ubiquitous applications that allow people to easily share text, pictures, and audio and video files. Popular networks include WhatsApp, Facebook, Reddit, and LinkedIn. OBJECTIVE We present an extensive study of the usage of the WhatsApp social network, an Internet messaging application that is quickly replacing SMS (short message service) messaging. To better understand people’s use of the network, we provide an analysis of over 6 million encrypted messages from over 100 users, with the objective of building demographic prediction models that use activity data but not the content of these messages. METHODS We performed extensive statistical and numerical analysis of the data and found significant differences in WhatsApp usage across people of different genders and ages. We also entered the data into the Weka and pROC data mining packages and studied models created from decision trees, Bayesian networks, and logistic regression algorithms. RESULTS We found that different gender and age demographics had significantly different usage habits in almost all message and group attributes. We also noted differences in users’ group behavior and created prediction models, including the likelihood that a given group would have relatively more file attachments and if a group would contain a larger num1 Jerusalem College of Technology, Jerusalem, Israel. Email: rosenfa@jct.ac.il. 2 Bar-Ilan University, Ramat Gan, Israel. http://www.demographic-research.org 647 Rosenfeld et al.: WhatsApp usage patterns and prediction of demographic characteristics ber of participants, a higher frequency of activity, quicker response times, and shorter messages. CONCLUSIONS We were successful in quantifying and predicting a user’s gender and age demographic. Similarly, we were able to predict different types of group usage. All models were built without analyzing message content. CONTRIBUTION The main contribution of this paper is the ability to predict user demographics without having access to users’ text content. We present a detailed discussion about the specific attributes that were contained in all predictive models and suggest possible applications based on these results.

[1]  Jasmine Jain,et al.  Learning Beyond the Walls: The Role of WhatsApp Groups , 2016 .

[2]  David García,et al.  It's a Man's Wikipedia? Assessing Gender Inequality in an Online Encyclopedia , 2015, ICWSM.

[3]  Lada A. Adamic,et al.  The role of social networks in information diffusion , 2012, WWW.

[4]  Thomas Gottron,et al.  Bad news travel fast: a content-based analysis of interestingness on Twitter , 2011, WebSci '11.

[5]  Robert E. Kraut,et al.  Gender, topic, and audience response: an analysis of user-generated content on facebook , 2013, CHI.

[6]  Umut Gulacti,et al.  An Analysis of WhatsApp Usage for Communication Between Consulting and Emergency Physicians , 2016, Journal of Medical Systems.

[7]  Dan Bouhnik,et al.  WhatsApp Goes to School: Mobile Instant Messaging between Teachers and Students , 2014, J. Inf. Technol. Educ. Res..

[8]  Jennifer Neville,et al.  Modeling relationship strength in online social networks , 2010, WWW '10.

[9]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[10]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[11]  C. Montag,et al.  Smartphone usage in the 21st century: who is active on WhatsApp? , 2015, BMC Research Notes.

[12]  Nimmi Rangaswamy,et al.  Offline Strangers, Online Friends: Bridging Classroom Gender Segregation with WhatsApp , 2015, CHI.

[13]  Lambèr M. M. Royakkers,et al.  Ethical issues in web data mining , 2004, Ethics and Information Technology.

[14]  Weiguo Fan,et al.  Tapping the power of text mining , 2006, CACM.

[15]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[16]  M. Thelwall,et al.  Data mining emotion in social network communication: Gender differences in MySpace , 2010 .

[17]  Kenton O'Hara,et al.  Everyday dwelling with WhatsApp , 2014, CSCW.

[18]  Pedro Casas,et al.  Vivisecting whatsapp through large-scale measurements in mobile networks , 2014, SIGCOMM.

[19]  Shlomo Argamon,et al.  Automatically profiling the author of an anonymous text , 2009, CACM.

[20]  Yang-Han Lee,et al.  Time distortion associated with smartphone addiction: Identifying smartphone addiction via a mobile application (App). , 2015, Journal of psychiatric research.

[21]  A. Darzi,et al.  Smartphones let surgeons know WhatsApp: an analysis of communication in emergency surgical teams. , 2015, American Journal of Surgery.

[22]  A. Kring,et al.  Sex differences in emotion: expression, experience, and physiology. , 1998, Journal of personality and social psychology.

[23]  Rodrigo de Oliveira,et al.  What's up with whatsapp?: comparing mobile instant messaging behaviors with traditional SMS , 2013, MobileHCI '13.

[24]  Martin Pielot,et al.  Didn't you see my message?: predicting attentiveness to mobile instant messages , 2014, CHI.