Word-length preferences in Chinese: a corpus study

Are there preferred word-length combinations in Chinese? If there are, are they motivated by semantics, syntax, prosody, or a combination of these? While the issue has been discussed for some time, opinions remain divided. This study offers a quantitative analysis of word-length patterns in Chinese [N N] and [V O] sequences, using the Lancaster Corpus of Mandarin Chinese. It is found that 1+2 is overwhelmingly disfavored in [N N] and 2+1 is overwhelmingly disfavored in [V O]. In addition, it is found that apparent exceptions, ranging between 1 and 2%, are limited to certain specific structures, and when these are factored out, both 1+2 [N N] and 2+1 [V O] are well below 1% in either token count or type count. The result bears on several theoretical debates, such as the validity of word-length preferences in Chinese, the motivation of the preferences, the extent and the nature of exceptions, and the interaction among syntax, semantics, and phonology.

[1]  M. Zubizarreta Prosody, Focus, and Word Order , 1998 .

[2]  C. F. Hockett A Course in Modern Linguistics , 1959 .

[3]  Morris Halle,et al.  Phonology in Generative Grammar , 1962 .

[4]  B. Hayes Metrical Stress Theory: Principles and Case Studies , 1995 .

[5]  Chris Golston,et al.  Syntax outranks phonology: evidence from Ancient Greek , 1995, Phonology.

[6]  C. Gussenhoven Focus, mode and the nucleus , 1983, Journal of Linguistics.

[7]  Stuart Davis Capitalistic v. Militaristic: The Paradigm Uniformity Effect Reconsidered , 2004 .

[8]  Sandy Lovie Shannon, Claude E , 2005 .

[9]  L. Bloomfield A Set of Postulates for the Science of Language , 1926, International Journal of American Linguistics.

[10]  Shengli Feng,et al.  Prosodically constrained postverbal PPs in Mandarin Chinese , 2003 .

[11]  San Duanmu,et al.  Wordhood in Chinese , 1998 .

[12]  伊藤 順子 Syllable theory in prosodic phonology , 1986 .

[13]  Anthony McEnery,et al.  The Lancaster Corpus of Mandarin Chinese: A Corpus for Monolingual and Contrastive Language Study , 2004, LREC.

[14]  Joan L. Bybee,et al.  Frequency of Use and the Organization of Language , 2006 .

[15]  San Duanmu,et al.  A formal study of syllable, tone, stress and domain in Chinese languages , 1990 .

[16]  Fei Xia The Segmentation Guidelines for the Penn Chinese Treebank (3.0) , 2000 .

[17]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[18]  Chilin Shih,et al.  The prosodic domain of tone sandhi in Chinese , 1986 .

[19]  Peter T. Richtsmeier,et al.  Word-types, not word-tokens, facilitate extraction of phonotactic sequences by adults. , 2011, Laboratory phonology.

[20]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[21]  F. Xia,et al.  The Part-Of-Speech Tagging Guidelines for the Penn Chinese Treebank (3.0) , 2000 .

[22]  M. Halle,et al.  An essay on stress , 1987 .

[23]  Fred W. Householder,et al.  A Course in Modern Linguistics , 1959 .

[24]  Bound Roots in Mandarin Chinese and Comparison With European "Semi-Words"1 , 2008 .

[25]  Michael Speriosu,et al.  The role of prosody in the English dative alternation , 2010 .

[26]  R. Sproat,et al.  A corpus-based analysis of Mandarin nominal root compound , 1996 .

[27]  San Duanmu,et al.  The Phonology of Standard Chinese , 2001 .

[28]  David B Pisoni,et al.  Perception of Wordlikeness: Effects of Segment Probability and Length on the Processing of Nonwords. , 2000, Journal of memory and language.

[29]  Matthew Y. Chen,et al.  Tone Sandhi: Patterns across Chinese Dialects , 2000 .

[30]  G. Cinque A null theory of phrase and compound stress , 1993 .