Hisory-Based Inside-Outside Algorithm

Grammar induction is one of the most important research areas of the natural language processing. The lack of a large Treebank, which is required in supervised grammar induction, in some natural languages such as Persian encouraged us to focus on unsupervised methods. We have found the Inside-Outside algorithm, introduced by Lari and Young, as a suitable platform to work on, and augmented IO with a history notion. The result is an improved unsupervised grammar induction method called History-based IO (HIO). Applying HIO to two very divergent natural languages (i.e., English and Persian) indicates that inducing more conditioned grammars improves the quality of the resultant grammar. Besides, our experiments on ATIS and WSJ show that HIO outperforms most current unsupervised grammar induction methods.

[1]  John D. Lafferty,et al.  Towards History-based Grammars: Using Richer Models for Probabilistic Parsing , 1993, ACL.

[2]  Heshaam Faili,et al.  An Application of Lexicalized Grammars in English-Persian Translation , 2004, ECAI.

[3]  Deniz Yuret,et al.  Discovery of linguistic relations using lexical attraction , 1998, ArXiv.

[4]  John D. Lafferty,et al.  Decision Tree Parsing using a Hidden Derivation Model , 1994, HLT.

[5]  Alexander Clark,et al.  Unsupervised Language Acquisition: Theory and Practice , 2002, ArXiv.

[6]  Pieter W. Adriaans,et al.  Grammar Induction as Substructural Inductive Logic Programming , 2001, Learning Language in Logic.

[7]  Eugene Charniak,et al.  Statistical Techniques for Natural Language Parsing , 1997, AI Mag..

[8]  Menno van Zaanen ABL: Alignment-Based Learning , 2000, COLING.

[9]  Frederick Jelinek,et al.  Towards history-based grammars: using richer models for probabilistic parsing , 1992 .

[10]  Heshaam Faili,et al.  Unsupervised grammar induction using history based approach , 2006, Comput. Speech Lang..

[11]  Mitchell P. Marcus,et al.  Pearl: A Probabilistic Chart Parser , 1991, EACL.

[12]  Mark Johnson The effect of alternative tree epresentatmns on tree bank grammars , 1998, CoNLL.

[13]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[14]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[15]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[16]  Karine Megerdoomian,et al.  Persian Computational Morphology: A Unification-Based Approach , 2000 .

[17]  Carl de Marcken,et al.  Unsupervised language acquisition , 1996, ArXiv.

[18]  Mitchell P. Marcus,et al.  Parsing a Natural Language Using Mutual Information Statistics , 1990, AAAI.

[19]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[20]  George R. Doddington,et al.  The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.

[21]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[22]  Glenn Carroll,et al.  Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[23]  Dan Klein,et al.  Natural Language Grammar Induction Using a Constituent-Context Model , 2001, NIPS.

[24]  Rémi Zajac,et al.  Persian-English Machine Translation: An Overview of the Shiraz Project , 2000 .

[25]  Ted Briscoe,et al.  Robust stochastic parsing using the inside-outside algorithm , 1994, ArXiv.

[26]  Christopher D. Manning,et al.  The unsupervised learning of natural language structure , 2005 .