Reducing the Size of the Representation for the uDOP-Estimate

The unsupervised Data Oriented Parsing (uDOP) approach has been repeatedly reported to achieve state of the art performance in experiments on parsing of different corpora. At the same time the approach is demanding both in computation time and memory. This paper describes an approach which decreases these demands. First the problem is translated into the generation of probabilistic bottom up tree automata (pBTA). Then it is explained how solving two standard problems for these automata results in a reduction in the size of the grammar. The reduction of the grammar size by using efficient algorithms for pBTAs is the main contribution of this paper. Experiments suggest that this leads to a reduction in grammar size by a factor of 2. This paper also suggests some extensions of the original uDOP algorithm that are made possible or aided by the use of tree automata.

[1]  Valentin I. Spitkovsky,et al.  Punctuation: Making a Point in Unsupervised Dependency Parsing , 2011, CoNLL.

[2]  Lourdes Araujo,et al.  Identifying Patterns for Unsupervised Grammar Induction , 2010, CoNLL.

[3]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[4]  Jason Baldridge,et al.  Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models , 2011, ACL.

[5]  Rens Bod,et al.  Is the End of Supervised Parsing in Sight? , 2007, ACL.

[6]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[7]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[8]  Christian Hänig Improvements in Unsupervised Co-Occurrence Based Parsing , 2010, CoNLL.

[9]  Rens Bod,et al.  An All-Subtrees Approach to Unsupervised Parsing , 2006, ACL.

[10]  Giorgio Satta,et al.  Estimation of Consistent Probabilistic Context-free Grammars , 2006, HLT-NAACL.

[11]  Symeon Bozapalidis Effective construction of the syntactic algebra of a recognizable series on trees , 2005, Acta Informatica.

[12]  Valentin I. Spitkovsky,et al.  Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing , 2010, ACL.

[13]  Dan Klein,et al.  Simple, Accurate Parsing with an All-Fragments Grammar , 2010, ACL.

[14]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[15]  H. Vogler,et al.  Weighted Tree Automata and Tree Transducers , 2009 .

[16]  Rens Bod,et al.  Unsupervised Parsing with U-DOP , 2006, CoNLL.

[17]  Giorgio Satta,et al.  Parsing Algorithms based on Tree Automata , 2009, IWPT.

[18]  Andreas Maletti Minimizing deterministic weighted tree automata , 2009, Inf. Comput..

[19]  Kevin Knight,et al.  An Overview of Probabilistic Tree Transducers for Natural Language Processing , 2005, CICLing.

[20]  Regina Barzilay,et al.  Using Semantic Cues to Learn Syntax , 2011, AAAI.

[21]  Jules J. Berman,et al.  Ruby: The Programming Language , 2008 .

[22]  Joshua Goodman,et al.  Parsing Inside-Out , 1998, ArXiv.