Dependency Syntax Analysis Using Grammar Induction and a Lexical Categories Precedence System

The unsupervised approach for syntactic analysis tries to discover the structure of the text using only raw text. In this paper we explore this approach using Grammar Inference Algorithms. Despite of still having room for improvement, our approach tries to minimize the effect of the current limitations of some grammar inductors by adding morphological information before the grammar induction process, and a novel system for converting a shallow parse to dependencies, which reconstructs information about inductor's undiscovered heads by means of a lexical categories precedence system. The performance of our parser, which needs no syntactic tagged resources or rules, trained with a small corpus, is 10% below to that of commercial semisupervised dependency analyzers for Spanish, and comparable to the state of the art for English.

[1]  Giorgio Satta,et al.  On the Complexity of Non-Projective Data-Driven Dependency Parsing , 2007, IWPT.

[2]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[3]  Ted Briscoe,et al.  Relational evaluation schemes , 2002 .

[4]  Irène Kowarski,et al.  Contribution of a Category Hierarchy to the Robustness of Syntactic Parsing , 1990, COLING.

[5]  Jagadeesh Gorla,et al.  Two Approaches for Building an Unsupervised Dependency Parser and Their Other Applications , 2007, AAAI.

[6]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[7]  Yuji Matsumoto,et al.  Multi-lingual Dependency Parsing at NAIST , 2006, CoNLL.

[8]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[9]  Douglas Biber English for the Computer: The SUSANNE Corpus and Analytic Scheme , 2002 .

[10]  Alexander Gelbukh,et al.  MICAI 2007: Advances in Artificial Intelligence, 6th Mexican International Conference on Artificial Intelligence, Aguascalientes, Mexico, November 4-10, 2007, Proceedings , 2007, MICAI.

[11]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[12]  Jane J. Robinson Methods for Obtaining Corresponding Phrase Structure and Dependency Grammars , 1967, COLING.

[13]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[14]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[15]  Hiram Calvo,et al.  ABSTRACT OF PhD THESIS Automatic Semantic Role Labeling using Selectional Preferences with Very Large Corpora Determinación Automática de Roles Semánticos usando Preferencias de Selección sobre Corpus muy Grandes , 2009 .

[16]  Mark A. Paskin,et al.  Grammatical Bigrams , 2001, NIPS.

[17]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[18]  David J. Brooks Unsupervised Grammar Induction by Distribution and Attachment , 2006, CoNLL.

[19]  Alon Lavie,et al.  Parser Combination by Reparsing , 2006, NAACL.

[20]  Maria Antònia Martí,et al.  Cat3LB and Cast3LB: From Constituents to Dependencies , 2006, FinTAL.

[21]  Hiram Calvo,et al.  Automatic Semantic Role Labeling using Selectional Preferences with Very Large Corpora , 2008, Computación y Sistemas.

[22]  Fernando Pereira,et al.  Multilingual Dependency Analysis with a Two-Stage Discriminative Parser , 2006, CoNLL.

[23]  Menno van Zaanen,et al.  Bootstrapping structure into language : alignment-based learning , 2001, ArXiv.

[24]  Menno van Zaanen,et al.  Alignment-based learning versus emile: A comparison , 2001 .

[25]  Noah A. Smith,et al.  Guiding Unsupervised Grammar Induction Using Contrastive Estimation , 2005 .

[26]  Ted Briscoe,et al.  Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars , 1993, CL.

[27]  Hiram Calvo,et al.  On the Usage of Morphological Tags for Grammar Induction , 2007, MICAI.

[28]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[29]  Timo Järvinen,et al.  A non-projective dependency parser , 1997, ANLP.

[30]  Mark A. Paskin,et al.  Cubic-time Parsing and Learning Algorithms for Grammatical Bigram , 2001 .

[31]  Deniz Yuret,et al.  Discovery of linguistic relations using lexical attraction , 1998, ArXiv.

[32]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[33]  Eugene Charniak,et al.  Statistical Techniques for Natural Language Parsing , 1997, AI Mag..

[34]  Alexander F. Gelbukh,et al.  Evaluation of TnT Tagger for Spanish , 2003, Proceedings of the Fourth Mexican International Conference on Computer Science, 2003. ENC 2003..

[35]  Alexander F. Gelbukh,et al.  Transforming a Constituency Treebank into a Dependency Treebank , 2005, Proces. del Leng. Natural.

[36]  Geoffrey Sampson,et al.  English for the Computer: The SUSANNE Corpus and Analytic Scheme , 1995, Computational Linguistics.

[37]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[38]  Alexander F. Gelbukh,et al.  DILUCT: An Open-Source Spanish Dependency Parser Based on Rules, Heuristics, and Selectional Preferences , 2006, NLDB.