论文信息 - Revisiting the Predictability of Language: Response Completion in Social Media

Revisiting the Predictability of Language: Response Completion in Social Media

The question "how predictable is English?" has long fascinated researchers. While prior work has focused on formal English typically used in news articles, we turn to texts generated by users in online settings that are more informal in nature. We are motivated by a novel application scenario: given the difficulty of typing on mobile devices, can we help reduce typing effort with message completion, especially in conversational settings? We propose a method for automatic response completion. Our approach models both the language used in responses and the specific context provided by the original message. Our experimental results on a large-scale dataset show that both components help reduce typing effort. We also perform an information-theoretic study in this setting and examine the entropy of user-generated content, especially in conversational scenarios, to better understand predictability of user generated English.

Bo Pang | Sujith Ravi

[1] Andreas Stolcke,et al. Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class-Dependent Mixtures , 2003, NAACL.

[2] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .

[3] Mirella Lapata,et al. Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[4] Joseph Weizenbaum,et al. ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.

[5] Rajesh P. N. Rao,et al. Entropic Evidence for Linguistic Structure in the Indus Script , 2009, Science.

[6] Thomas M. Cover,et al. A convergent gambling estimate of the entropy of English , 1978, IEEE Trans. Inf. Theory.

[7] Jerzy W. Grzymala-Busse,et al. Entropy of English Text: Experiments with Humans and a Machine Learning System Based on Rough Sets , 1998, Inf. Sci..

[8] Richard Sproat,et al. The Collapse of the Indus-Script Thesis: The Myth of a Literate Harappan Civilization , 2004 .

[9] Petri Saarikko,et al. Predictive text input in a mobile shopping assistant: methods and interface design , 2009, IUI.

[10] Claude E. Shannon,et al. Prediction and Entropy of Printed English , 1951 .

[11] James Allan,et al. A Comparative Study of Utilizing Topic Models for Information Retrieval , 2009, ECIR.