Towards a Universal Grammar for Natural Language Processing

Universal Dependencies is a recent initiative to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. In this paper, I outline the motivation behind the initiative and explain how the basic design principles follow from these requirements. I then discuss the different components of the annotation standard, including principles for word segmentation, morphological annotation, and syntactic annotation. I conclude with some thoughts on the challenges that lie ahead.

[1]  Edmond Nolan,et al.  The Greek grammar of Roger Bacon and a fragment of his Hebrew grammar , 2015 .

[2]  Joakim Nivre,et al.  Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.

[3]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[4]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[5]  Miriam Butt,et al.  The Parallel Grammar Project , 2002, COLING 2002.

[6]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[7]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[8]  Emily M. Bender,et al.  The Grammar Matrix: An Open-Source Starter-Kit for the Rapid Development of Cross-linguistically Consistent Broad-Coverage Precision Grammars , 2002, COLING 2002.

[9]  Reut Tsarfaty,et al.  A Unified Morpho-Syntactic Scheme of Stanford Dependencies , 2013, ACL.

[10]  Daniel Zeman,et al.  HamleDT: To Parse or Not to Parse? , 2012, LREC.

[11]  Herbert E. Brekle,et al.  Grammaire générale et raisonnée, ou, La grammaire de Port-Royal , 1966 .

[12]  James P. Blevins,et al.  Word-based morphology , 2006, Journal of Linguistics.

[13]  Joakim Nivre,et al.  Target Language Adaptation of Discriminative Transfer Parsers , 2013, NAACL.

[14]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .

[15]  Yannick Versley,et al.  Statistical Parsing of Morphologically Rich Languages (SPMRL) What, How and Whither , 2010, SPMRL@NAACL-HLT.

[16]  Nicoletta Calzolari,et al.  Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014) , 2014, LREC 2014.

[17]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[18]  Roger Bacon The Greek grammar of Roger Bacon and a fragment of his Hebrew grammar , 1902 .

[19]  Daniel Zeman,et al.  Reusable Tagset Conversion Using Tagset Drivers , 2008, LREC.

[20]  Regina Barzilay,et al.  Selective Sharing for Multilingual Dependency Parsing , 2012, ACL.

[21]  J. Sullivan ON CARTESIAN LINGUISTICS , 1977 .

[22]  Joakim Nivre,et al.  Characterizing the Errors of Data-Driven Dependency Parsing Models , 2007, EMNLP.