Zipf's Law in Human-Machine Dialog

Zipf's law is a mathematically relatively simple formula stating that the frequency of a word is inversely correlated with its rank. Zipf's law is well-known in computational linguistics and cognitive sciences alike. In the context of agent development, however, Zipf's law has hardly ever been mentioned. This is surprising as principles regarding language likely benefit the development of conversational agents. This paper serves as a starting point to explore the role of Zipf's law in agent development, showing that Zipf's law also applies to dialog. Moreover, it can shed light on human-machine dialog. In addition to word frequency distributions that demonstrate Zipf's law, we also included frequency distributions of words at specific positions in the sentence as well as turn lengths. Zipf's law was found in the far majority of analyses we conducted. In addition, we investigated whether Zipf's law can be used to detect differences between human and agent-generated speech through correlating the distributions and found that even though both the human and agent frequency distributions follow Zipf's law, these distributions are not necessarily similar, shedding light on where agent dialog may distinguish itself from human dialog. The findings in this paper can thus serve as a way to monitor to what extent ubiquitous patterns in human-human dialog are found in human-machine dialog.

[1]  Matthew Henderson,et al.  The third Dialog State Tracking Challenge , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[2]  S. Kirby,et al.  Compression and communication in the cultural evolution of linguistic structure , 2015, Cognition.

[3]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[4]  Lahomtoires d'Electronique AN INFORMATIONAL THEORY OF THE STATISTICAL STRUCTURE OF LANGUAGE 36 , 2010 .

[5]  Joseph Weizenbaum,et al.  ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.

[6]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[7]  G D Brown,et al.  Word-frequency effects on short-term memory tasks: evidence for a redintegration process in immediate serial recall. , 1997, Journal of experimental psychology. Learning, memory, and cognition.

[8]  Michel L. Goldstein,et al.  Problems with fitting to the power-law distribution , 2004, cond-mat/0402322.

[9]  LAURANCE R. DOYLE,et al.  Quantitative tools for comparing animal communication systems: information theory applied to bottlenose dolphin whistle repertoires , 1999, Animal Behaviour.

[10]  Johanna D. Moore,et al.  Alignment and task success in spoken dialogue , 2014, Journal of Memory and Language.

[11]  Matjaz Perc,et al.  Evolution of the most common English words and phrases over the centuries , 2012, Journal of The Royal Society Interface.

[12]  M. Pickering,et al.  Linguistic alignment between people and computers , 2010 .

[13]  Géza Németh,et al.  Multilingual statistical text analysis, Zipf's law and Hungarian speech generation , 2002 .

[14]  Ricard V. Solé,et al.  Least effort and the origins of scaling in human language , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  S. Piantadosi Zipf’s word frequency law in natural language: A critical review and future directions , 2014, Psychonomic Bulletin & Review.

[16]  Anne H. Anderson,et al.  The Hcrc Map Task Corpus , 1991 .

[17]  Francis Jack Smith,et al.  Extension of Zipf’s Law to Words and Phrases , 2002, COLING.

[18]  Michael C. Frank,et al.  Zipfian frequency distributions facilitate word segmentation in context , 2013, Cognition.

[19]  Antoine Raux,et al.  The Dialog State Tracking Challenge , 2013, SIGDIAL Conference.

[20]  G. Miller,et al.  Some effects of intermittent silence. , 1957, The American journal of psychology.

[21]  Richard S. Wallace,et al.  The Anatomy of A.L.I.C.E. , 2009 .

[22]  Steven T Piantadosi,et al.  Word lengths are optimized for efficient communication , 2011, Proceedings of the National Academy of Sciences.

[23]  Philip Hanna,et al.  Extending Zipf’s law to n-grams for large corpora , 2009, Artificial Intelligence Review.

[24]  Rick Dale,et al.  Behavior Matching in Multimodal Communication Is Synchronized , 2012, Cogn. Sci..

[25]  Arjuna Tuzzi,et al.  Analysis of Italian word classes , 2010, Glottometrics.

[26]  Morten H. Christiansen,et al.  Language as shaped by the brain. , 2008, The Behavioral and brain sciences.

[27]  Susan E. Brennan,et al.  LEXICAL ENTRAINMENT IN SPONTANEOUS DIALOG , 1996 .

[28]  Peter E. Latham,et al.  Zipf’s Law Arises Naturally When There Are Underlying, Unobserved Variables , 2016, PLoS Comput. Biol..

[29]  A. Baddeley,et al.  Word length and the structure of short-term memory , 1975 .

[30]  Simon Kirby,et al.  Zipf’s Law of Abbreviation and the Principle of Least Effort: Language users optimise a miniature lexicon for efficient communication , 2017, Cognition.

[31]  M. Pickering,et al.  Toward a mechanistic psychology of dialogue , 2004, Behavioral and Brain Sciences.

[32]  Ricard V. Solé,et al.  Emergence of Zipf's Law in the Evolution of Communication , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  John R. Anderson,et al.  Reflections of the Environment in Memory Form of the Memory Functions , 2022 .

[34]  Christopher T. Kello,et al.  Scaling laws in cognitive sciences , 2010, Trends in Cognitive Sciences.

[35]  R. Ferrer-i-Cancho,et al.  The Evolution of the Exponent of Zipf's Law in Language Ontogeny , 2013, PloS one.

[36]  S. Dehaene,et al.  Cross-linguistic regularities in the frequency of number words , 1992, Cognition.

[37]  Amy Perfors,et al.  Cross-situational learning in a Zipfian environment , 2019, Cognition.

[38]  Wentian Li,et al.  Random texts exhibit Zipf's-law-like word frequency distribution , 1992, IEEE Trans. Inf. Theory.

[39]  G. Zipf The Psycho-Biology Of Language: AN INTRODUCTION TO DYNAMIC PHILOLOGY , 1999 .

[40]  Satoshi Nakamura,et al.  An Incremental Turn-Taking Model For Task-Oriented Dialog Systems , 2019, INTERSPEECH.

[41]  Simon Garrod,et al.  Joint Action, Interactive Alignment, and Dialog , 2009, Top. Cogn. Sci..

[42]  Heather H. Mitchell,et al.  Toward a Taxonomy of a Set of Discourse Markers in Dialog: A Theoretical and Computational Linguistic Account , 2003 .

[43]  Richard W. Byrne,et al.  Why do gorillas make sequences of gestures? , 2010, Animal Cognition.

[44]  Ramon Ferrer-i-Cancho,et al.  Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution , 2010, PloS one.