Quantitative Comparative Syntax on the Cantonese-Mandarin Parallel Dependency Treebank

This paper describes a new Cantonese-Mandarin parallel dependency treebank. We discuss the extent to which the treebank allows for comparative measures with the goal of quantifying structural differences between the two languages. After presenting syntactic differences between the two languages, we computed various frequency measures on the treebank. We present the results and discuss whether they reflect differences in text genre, differences in annotation scheme design, or actual structural differences. Finally, we compare the structural differences to previous accounts of the observed construction.

[1]  Kim Gerdes Collaborative Dependency Annotation , 2013, DepLing.

[2]  Sylvain Kahane,et al.  Dependency Annotation Choices: Assessing Theoretical and Practical Issues of Universal Dependencies , 2016, LAW@ACL.

[3]  David C. S. Li,et al.  Facilitation of transference: The case of monosyllabic salience in Hong Kong Cantonese , 2016 .

[4]  Joakim Nivre,et al.  Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.

[5]  Xinying Chen,et al.  Classifying Languages by Dependency Structure. Typologies of Delexicalized Universal Dependency Treebanks , 2017, DepLing.

[6]  Xinying Chen,et al.  Developing Universal Dependencies for Mandarin Chinese , 2016, ALR@COLING.

[7]  Timothy Osborne,et al.  Diagnostics for Constituents: Dependency, Constituency, and the Status of Function Words , 2015, DepLing.

[8]  Lym Wong,et al.  The Hong Kong Cantonese Corpus: design and uses , 2015 .

[9]  Hun-tak Thomas Lee,et al.  Cancorp: The Hong Kong Cantonese child language corpus , 1998 .

[10]  Virginia Yip,et al.  Cantonese: A Comprehensive Grammar , 1994 .

[11]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[12]  John Lee Toward a Parallel Corpus of Spoken Cantonese and Written Chinese , 2011, IJCNLP.

[13]  M. Berry An introduction to Catastrophe Theory , 1981 .

[14]  Haitao Liu,et al.  Dependency direction as a means of word-order typology: A method based on dependency treebanks , 2010 .

[15]  Joakim Nivre,et al.  Towards a Universal Grammar for Natural Language Processing , 2015, CICLing.

[16]  Ludovic Lebart,et al.  Recent developments in the statistical processing of textual data , 1991 .

[17]  Elaine J. Francis,et al.  Categoriality and Object Extraction in Cantonese Serial Verb Constructions , 2006 .