How Many Multiword Expressions do People Know?

What is a multiword expression (MWE) and how many are there? Mark Liberman gave a great invited talk at ACL-89, titled “How Many Words Do People Know?” where he spent the entire hour questioning the question. Many of the same questions apply to multiword expressions. What is a word? An expression? What is many? What is a person? What does it mean to know? Rather than answer these questions, this article will use them as Liberman did, as an excuse for surveying how such issues are addressed in a variety of fields: computer science, Web search, linguistics, lexicography, educational testing, psychology, statistics, and so on.

[1]  David Yarowsky,et al.  One Sense per Collocation , 1993, HLT.

[2]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[3]  F. Mosteller,et al.  Inference and Disputed Authorship: The Federalist , 1966 .

[4]  John Llewelyn,et al.  Six Lectures on Sound and Meaning , 1980 .

[5]  Frederick Jelinek,et al.  Some of my Best Friends are Linguists , 2005, Lang. Resour. Evaluation.

[6]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[7]  Geoffrey Sampson,et al.  Word frequency distributions , 2002, Computational Linguistics.

[8]  Rochelle Young,et al.  Adventures in wonderland , 2001, Nature Biotechnology.

[9]  Susan T. Dumais,et al.  An Analysis of the AskMSR Question-Answering System , 2002, EMNLP.

[10]  谭春萍 互文性视角下的小说翻译研究——以Alice’s Adventures in the Wonder-land中译本为例 , 2015 .

[11]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[12]  S. C. Kohs,et al.  The vocabulary test as a measure of intelligence. , 1918 .

[13]  Geoffrey K. Pullum,et al.  Recursion and the infinitude claim , 2010 .

[14]  Julien Bourdaillet,et al.  TransSearch: from a bilingual concordancer to a translation finder , 2010, Machine Translation.

[15]  Ray Jackendoff,et al.  The Architecture of the Language Faculty , 1996 .

[16]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[17]  Sarah Kucenas,et al.  Adventures in Wonderland , 2015, PLoS genetics.

[18]  Ken Ward Church,et al.  Using Word-Sense Disambiguation Methods to Classify Web Queries by Intent , 2009, EMNLP.

[19]  James R. Curran,et al.  Parsing Noun Phrases in the Penn Treebank , 2011, Computational Linguistics.

[20]  Roman Jakobson,et al.  Six Lectures On Sound And Meaning , 1978 .

[21]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[22]  P. McCullagh Estimating the Number of Unseen Species: How Many Words did Shakespeare Know? , 2008 .

[23]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[24]  Noam Chomsky,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[25]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[26]  SproatRichard,et al.  A stochastic finite-state word-segmentation algorithm for Chinese , 1996 .

[27]  Betty Kirkpatrick,et al.  Roget's Thesaurus , 1852 .

[28]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[29]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[30]  Richard Sproat English noun-phrase accent prediction for text-to-speech , 1994, Comput. Speech Lang..

[31]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[32]  Michele Banko,et al.  Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.

[33]  Kenneth Ward Church,et al.  A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[34]  Kenneth Ward Church,et al.  Morphology and rhyming: two powerful alternatives to letter-to-sound rules for speech synthesis , 1990, SSW.

[35]  Eric Brill,et al.  Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users , 2004, EMNLP.

[36]  Kenneth Ward Church,et al.  Termight: Identifying and Translating Technical Terminology , 1994, ANLP.

[37]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[38]  L. Carroll,et al.  Alice's Adventures in Wonderland: Princeton University Press , 2015 .

[39]  David Yarowsky,et al.  Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation , 2011, ACL.

[40]  Ted Briscoe,et al.  The Second Release of the RASP System , 2006, ACL.

[41]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[42]  Dutch ROGET'S THESAURUS , 1979 .

[43]  Kenneth Ward Church,et al.  Text Analysis and Word Pronunciation in Text-to-speech Synthesis , 2013 .

[44]  Doug Beeferman,et al.  Say what? why users choose to speak their web queries , 2010, INTERSPEECH.

[45]  Fei Xia,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[46]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[47]  Kenneth Ward Church,et al.  Enhanced Good-Turing and Cat-Cal: Two New Methods for Estimating Probabilities of English Bigrams (abbreviated version) , 1989, HLT.