Automatic Processing of Linguistic Data as a Feedback for Linguistic Theory

The paper describes a method of identifying a set of interesting constructions in a syntactically annotated corpus of Czech – the Prague Dependency Treebank – by application of an automatic procedure of analysis by reduction to the trees in the treebank. The procedure clearly reveals certain linguistic phenomena that go beyond ‘dependency nature’ (and thus generally pose a problem for dependency-based formalisms). Moreover, it provides a feedback indicating that the annotation of a particular phenomenon might be inconsistent.

[1]  Roman Grundkiewicz,et al.  Automatic Extraction of Polish Language Errors from Text Edition History , 2013, TSD.

[2]  Petr Pajas,et al.  Annotation on the tectogrammatical level in the Prague Dependency Treebank : Reference book , 2006 .

[3]  Igor Mel'cuk,et al.  Dependency in Language , 2015 .

[4]  Eva Hajicová,et al.  Issues of Projectivity in the Prague Dependency Treebank , 2004, Prague Bull. Math. Linguistics.

[5]  Frantisek Mráz,et al.  (In)Dependencies in Functional Generative Description by Restarting Automata , 2010, NCMA.

[6]  Jonathan Ginzburg,et al.  Proceedings of COLING 2004 , 2004 .

[7]  Robin Milner An Action Structure for Synchronous pi-Calculus , 1993, FCT.

[8]  Sylvain Kahane,et al.  Defining dependencies (and constituents) , 2011, DepLing.

[9]  Jirka Hana,et al.  Czech clitics in higher order grammar , 2007 .

[10]  Jan Hajic,et al.  The Prague Dependency Treebank , 2003 .

[11]  Sabine Schulte im Walde,et al.  Proceedings of the ACL-IJCNLP 2009 Software Demonstrations , 2009 .

[12]  Frantisek Mráz,et al.  On Monotonic Automata with a Restart Operation , 1999, J. Autom. Lang. Comb..

[13]  Frantisek Mráz,et al.  Restarting Automata , 1995, FCT.

[14]  Petr Pajas,et al.  Recent Advances in a Feature-Rich Framework for Treebank Annotation , 2008, COLING.

[15]  V. Kubon,et al.  On complexity of word order , 2000 .

[16]  Eva Hajicová,et al.  Corpus Annotation as a Test of a Linguistic Theory , 2006, LREC.

[17]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .

[18]  Petr Sgall,et al.  The Meaning Of The Sentence In Its Semantic And Pragmatic Aspects , 1986 .

[19]  Petr Pajas,et al.  System for Querying Syntactically Annotated Corpora , 2009, ACL/IJCNLP.

[20]  Martin Plátek,et al.  Modeling Syntax of Free Word-Order Languages: Dependency Analysis by Reduction , 2005, TSD.

[21]  Martin Plátek,et al.  Towards a Formal Model for Functional Generative Description: Analysis by Reduction and Restarting Automata , 2007, Prague Bull. Math. Linguistics.