Revisiting the Predictability of Language: Response Completion in Social Media

The question "how predictable is English?" has long fascinated researchers. While prior work has focused on formal English typically used in news articles, we turn to texts generated by users in online settings that are more informal in nature. We are motivated by a novel application scenario: given the difficulty of typing on mobile devices, can we help reduce typing effort with message completion, especially in conversational settings? We propose a method for automatic response completion. Our approach models both the language used in responses and the specific context provided by the original message. Our experimental results on a large-scale dataset show that both components help reduce typing effort. We also perform an information-theoretic study in this setting and examine the entropy of user-generated content, especially in conversational scenarios, to better understand predictability of user generated English.

[1]  Andreas Stolcke,et al.  Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class-Dependent Mixtures , 2003, NAACL.

[2]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[3]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[4]  Joseph Weizenbaum,et al.  ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.

[5]  Rajesh P. N. Rao,et al.  Entropic Evidence for Linguistic Structure in the Indus Script , 2009, Science.

[6]  Thomas M. Cover,et al.  A convergent gambling estimate of the entropy of English , 1978, IEEE Trans. Inf. Theory.

[7]  Jerzy W. Grzymala-Busse,et al.  Entropy of English Text: Experiments with Humans and a Machine Learning System Based on Rough Sets , 1998, Inf. Sci..

[8]  Richard Sproat,et al.  The Collapse of the Indus-Script Thesis: The Myth of a Literate Harappan Civilization , 2004 .

[9]  Petri Saarikko,et al.  Predictive text input in a mobile shopping assistant: methods and interface design , 2009, IUI.

[10]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[11]  James Allan,et al.  A Comparative Study of Utilizing Topic Models for Information Retrieval , 2009, ECIR.

[12]  Christina L. James,et al.  Text input for mobile devices: comparing model prediction to actual performance , 2001, CHI.

[13]  Min-Yen Kan Optimizing predictive text entry for short message service on mobile phones 1 , 2005 .

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[16]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[17]  Robert L. Mercer,et al.  An Estimate of an Upper Bound for the Entropy of English , 1992, CL.

[18]  Davras Yavuz Zipf's law and entropy (Corresp.) , 1974, IEEE Trans. Inf. Theory.

[19]  Alan Ritter,et al.  Data-Driven Response Generation in Social Media , 2011, EMNLP.

[20]  John G. Cleary,et al.  The entropy of English using PPM-based models , 1996, Proceedings of Data Compression Conference - DCC '96.

[21]  I. Scott MacKenzie,et al.  Text Entry for Mobile Computing: Models and Methods,Theory and Practice , 2002, Hum. Comput. Interact..

[22]  Kenneth Ward Church,et al.  Entropy of search logs: how hard is search? with personalization? with backoff? , 2008, WSDM '08.