论文信息 - Word-Order Analysis Based Upon Treebank Data

Word-Order Analysis Based Upon Treebank Data

The paper describes an experiment consisting in the attempt to quantify word-order properties of three Indo-European languages (Czech, English and Farsi). The investigation is driven by the endeavor to find an objective way how to compare natural languages from the point of view of the degree of their word-order freedom. Unlike similar studies which concentrate either on purely linguistic or purely statistical approach, our experiment tries to combine both – the observations are verified against large samples of sentences from available treebanks, and, at the same time, we exploit the ability of our tools to analyze selected important phenomena (as, e.g., the differences of the word order of a main and a subordinate clause) more deeply.

Vladislav Kubon | Markéta Lopatková

[1] Natalia Klyueva,et al. Annotation of sentence structure , 2012, Lang. Resour. Evaluation.

[2] Rudolf Rosa,et al. KLcpos3 - a Language Similarity Measure for Delexicalized Parser Transfer , 2015, ACL.

[3] Martin Haspelmath,et al. The World Atlas of Language Structures Online , 2013 .

[4] Ondrej Dusek,et al. HamleDT: Harmonized multi-language dependency treebank , 2014, Lang. Resour. Evaluation.

[5] Ferdinand de Saussure. Course in General Linguistics , 1916 .

[6] Lorna Balkan,et al. TSNLP - Test Suites for Natural Language Processing , 1996, COLING.

[7] Petr Pajas,et al. System for Querying Syntactically Annotated Corpora , 2009, ACL/IJCNLP.

[8] Richard Futrell,et al. Quantifying Word Order Freedom in Dependency Corpora , 2015, DepLing.

[9] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[10] Lorna Balkan,et al. Test Suites for Natural Language Processing , 1995, TC.

[11] Tom Fleischer,et al. Book Review: Language. An Introduction to the study of Speech , 1925 .

[12] Philip Resnik,et al. Cross-Language Parser Adaptation between Related Languages , 2008, IJCNLP.

[13] V. Kubon,et al. On complexity of word order , 2000 .

[14] Martin Plátek,et al. On Formalization of Word Order Properties , 2012, CICLing.

[15] Rudolf Rosa,et al. MSTParser Model Interpolation for Multi-Source Delexicalized Transfer , 2015, IWPT.