Dependency relations and dependency distance: a statistical view based on Treebank

The dependency relation is the most essential ingredient in a dependency-based theory of syntax. This paper presents some statistical findings on the dependency relation extracted from a Chinese dependency treebank. A sentence in the proposed treebank can easily be converted into a SSyntS graph in Meaning-Text Theory. The statistics on the dependency relation show that modifiers make up 55% of all dependencies and actants have a lower proportion of 45%. The paper demonstrates it is possible to extract from the treebank active and passive valence information of a word (or word class). The paper gives a formula to calculate the mean dependency distance (MDD) for a specific type of dependency relation in a language and obtains MDD of all dependency types in Chinese. These figures show that some dependencies tend to be much farther apart than others, and demonstrate that dependency distance tends to minimization and different dependency types have varying preference on the direction of dependency.

[1]  Haitao Liu,et al.  Probability distribution of dependency distance , 2007, Glottometrics.

[2]  E. Gibson Linguistic complexity: locality of syntactic dependencies , 1998, Cognition.

[3]  Paul Bennett,et al.  Metataxis in practice: Dependency syntax for multilingual machine translation , 1990 .

[4]  Anat Ninio,et al.  Language and the Learning Curve: A New Theory of Syntactic Development , 2006 .

[5]  Joakim Nivre,et al.  Inductive Dependency Parsing , 2006, Text, speech and language technology.

[6]  V. Kubon,et al.  Two Useful Measures of Word Order Complexity , 1998, Workshop On Processing Of Dependency-Based Grammars.

[7]  R. F. Cancho Euclidean distance between syntactically linked words. , 2004 .

[8]  Haitao Liu,et al.  A Chinese Dependency Syntax for Treebanking , 2006, PACLIC.

[9]  Richard Hudson,et al.  The psychological reality of syntactic dependency relations , 2003 .

[10]  Jacques Courtin,et al.  Parsing with Dependency Relations and Robust Parsing , 2002 .

[11]  D. Biber,et al.  Longman Grammar of Spoken and Written English , 1999 .

[12]  Anne Abeillé,et al.  Treebanks: Building and Using Parsed Corpora , 2003 .

[13]  R. Harald Baayen,et al.  Word Frequency Distributions , 2001 .

[14]  Ludwig M. Eichinger,et al.  Levels of Dependency Description: Concepts and Problems , 2003 .

[15]  Sylvain Kahane,et al.  A Fully Lexicalized Grammar for French Based on Meaning-Text Theory (Invited Talk) , 2001, CICLing.

[16]  Verónica Dahl,et al.  Discontinuous grammars 1 , 2 , 1989, Comput. Intell..

[17]  Richard Hudson,et al.  Language Networks: The New Word Grammar , 2007 .

[18]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[19]  Igor Mel’čuk,et al.  Surface syntax of English , 1986 .