LanguageLogger: A Mobile Keyboard Application for Studying Language Use in Everyday Text Communication in the Wild

We present a concept and tool for studying language use in everyday mobile text communication (e.g. chats). Our approach for the first time enables researchers to collect comprehensive data on language use during unconstrained natural typing (i.e. no study tasks) without logging readable messages to preserve privacy. We achieve this with a combination of three customisable text abstraction methods that run directly on participants' phones. We report on our implementation as an Android keyboard app and two evaluations: First, we simulate text reconstruction attempts on a large text corpus to inform conditions for minimising privacy risks. Second, we assess people's experiences in a two-week field deployment (N=20). We release our app as an open source project to the community to facilitate research on open questions in HCI, Linguistics and Psychology. We conclude with concrete ideas for future studies in these areas.

[1]  Barry A. T. Brown,et al.  100 days of iPhone use: understanding the details of mobile device use , 2014, MobileHCI '14.

[2]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[3]  Daniel Buschek,et al.  Understanding Emoji Interpretation through User Personality and Message Context , 2019, MobileHCI.

[4]  Loren G. Terveen,et al.  Understanding Emoji Ambiguity in Context: The Role of Text in Emoji-Related Miscommunication , 2017, ICWSM.

[5]  Wessel Stoop,et al.  Collecting Facebook Posts and WhatsApp Chats - Corpus Compilation of Private Social Media Messages , 2016, TSD.

[6]  Sarit Kraus,et al.  A Study of WhatsApp Usage Patterns and Prediction Models without Message Content , 2018, ArXiv.

[7]  B. Siebenhaar,et al.  Code choice and code-switching in Swiss-German Internet Relay Chat rooms , 2006 .

[8]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[9]  Ethan Kross,et al.  Does Counting Emotion Words on Online Social Networks Provide a Window Into People’s Subjective Experience of Emotion? A Case Study on Facebook , 2019, Emotion.

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[12]  Adam N. Joinson,et al.  Development of measures of online privacy concern and protection for use on the Internet , 2007, J. Assoc. Inf. Sci. Technol..

[13]  M. Bühner,et al.  Personality Traits Predict Smartphone Usage , 2017 .

[14]  Gerhard Heyer,et al.  SentiWS - A Publicly Available German-language Resource for Sentiment Analysis , 2010, LREC.

[15]  Michael Rohs,et al.  EmojiZoom: emoji entry via large overview maps 😄🔍 , 2016, MobileHCI.

[16]  Tal Yarkoni Personality in 100,000 Words: A large-scale analysis of personality and word use among bloggers. , 2010, Journal of research in personality.

[17]  John C. Paolillo,et al.  Gender and genre variation in weblogs , 2006 .

[18]  Joshua Goodman,et al.  Language modeling for soft keyboards , 2002, IUI '02.

[19]  Jennifer Golbeck,et al.  Predicting Personality from Twitter , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[20]  Marie-Francine Moens,et al.  Computational personality recognition in social media , 2016, User Modeling and User-Adapted Interaction.

[21]  Niels Henze,et al.  Observational and experimental investigation of typing behaviour using virtual keyboards for mobile devices , 2012, CHI.

[22]  Azy Barak,et al.  Degree and Reciprocity of Self-Disclosure in Online Forums , 2007, Cyberpsychology Behav. Soc. Netw..

[23]  Gene P. Ouellette,et al.  Generation Text: Relations among Undergraduates' Use of Text Messaging, Textese, and Language and Literacy Skills , 2016 .

[24]  Elisabeth Stark,et al.  What’s up, Switzerland? A corpus-based research project in a multilingual country , 2017 .

[25]  Michael Wilson,et al.  MRC psycholinguistic database: Machine-usable dictionary, version 2.00 , 1988 .

[26]  Lisa J. Orchard,et al.  Emoticon convergence in Internet chat rooms , 2013 .

[27]  Michelle Drouin,et al.  R u txting? Is the Use of Text Speak Hurting Your Literacy? , 2009 .

[28]  Henriette Cramer,et al.  Sender-intended functions of emojis in US messaging , 2016, MobileHCI.

[29]  Zhiyi Song,et al.  Collecting Natural SMS and Chat Conversations in Multiple Languages: The BOLT Phase 2 Corpus , 2014, LREC.

[30]  Klaus von Heusinger,et al.  Handbücher zur Sprach- und Kommunikationswissenschaft / Handbooks of Linguistics and Communication Science , 2011 .

[31]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[32]  Shumin Zhai,et al.  Performance and User Experience of Touchscreen and Gesture Keyboards in a Lab Setting and in the Wild , 2015, CHI.

[33]  J. Pennebaker,et al.  The Electronically Activated Recorder (EAR): A device for sampling naturalistic daily activities and conversations , 2001, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[34]  Peter Young,et al.  Smart Reply: Automated Response Suggestion for Email , 2016, KDD.

[35]  Michelle Drouin,et al.  Texting, textese and literacy abilities: a naturalistic study , 2014 .

[36]  Naomi S. Baron,et al.  Text Messaging and IM , 2007 .

[37]  P. Eckert Variation and the indexical field 1 , 2008 .

[38]  Thomas Eckart,et al.  Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages , 2012, LREC.

[39]  Jiebo Luo,et al.  Mining the Relationship between Emoji Usage Patterns and Personality , 2018, ICWSM.

[40]  Daniel Buschek,et al.  Experience Sampling as Information Transmission: Perspective and Implications , 2018, UbiComp/ISWC Adjunct.

[41]  Per Ola Kristensson,et al.  The inviscid text entry rate and its application as a grand goal for mobile text entry , 2014, MobileHCI '14.

[42]  Kate Faasse,et al.  Public Anxiety and Information Seeking Following the H1N1 Outbreak: Blogs, Newspaper Articles, and Wikipedia Visits , 2012, Health communication.

[43]  Alireza Sahami Shirazi,et al.  Large-scale assessment of mobile notifications , 2014, CHI.

[44]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[45]  Niels van Berkel,et al.  The Experience Sampling Method on Mobile Devices , 2017, ACM Comput. Surv..

[46]  Florian Alt,et al.  ResearchIME: A Mobile Keyboard Application for Studying Free Typing Behaviour in the Wild , 2018, CHI.

[47]  P. Pintrich A Motivational Science Perspective on the Role of Student Motivation in Learning and Teaching Contexts. , 2003 .

[48]  Noah A. Smith Contextual Word Representations: A Contextual Introduction , 2019, ArXiv.

[49]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[50]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[51]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.