Cross-language dependency treebank-An effective approach for contrastive study of languages: Comment on "Dependency distance: A new perspective on syntactic patterns in natural languages" by Haitao Liu et al.

In dependency grammar, dependency distance (DD) [1–3] is an important index of memory burden and an indicator of syntactic difficulty. For reducing this memory burden, human languages present a tendency of dependency distance minimization (DDM). In the review [4], Liu et al. sum up a considerable volume of previous achievements on the topic of DDM. Several studies have proved that language is a human-driven complex adaptive system, and the tendency of DDM in syntactic structure is a human-driven linguistic universal which is shaped by psychological and biological constraints for the sake of limited working memory. As an effective and reliable approach, large-scale cross-language empirical corpora are used during the process of seeking linguistic universals. The cross-language research process involves comparison of different languages. Language is a complex system and any language has its own characteristics on syntactic structure, which can be described more accurately when compared with other languages. In this comment, I introduce a quantitative and contrastive study on same dependencies (subjects and objects) of different languages (English and Chinese) based on bilingual dependency treebanks (syntactic annotated corpus with dependency schemes) to further demonstrate this linguistic universal in specific dependencies in different languages, and also to check the advantage of the quantitative method in contrastive study of languages. To support the hypothesis that DDM is a linguistic universal, several researchers carry out studies through using various empirical data. Large-scale cross-language studies of 20 languages [2], 37 languages [5], and 30 languages [6] are used to testify that the relatively short dependency distance is a recurring linguistic regularity of natural languages. Large-scale cross-language study presents an overall investigation on tendency of DDM. Based on English–Chinese Dependency Treebank [7], Li explores the quantitative characteristics of subject and object dependencies of these two languages. The results show that mean dependency distance (MDD) of subject and object of Chinese is greater than that of English, which is same with the results of the whole treebanks. Chinese taxes more cognition cost and working memory than English.