Using genetic algorithm for Persian grammar induction

Most of efficient computational approaches in NLP tasks are supervised methods which need annotated corpora. But the lack of supervised data in Persian encourages researchers to increase their interests and efforts on unsuper-vised and semi-supervised approaches. This paper presents a novel semi-supervised approach which called Genetic-based inside-outside (GIO), for Persian grammar inference for inducing a grammar model in a PCFG formalism. GIO is an extension of the inside-outside algorithm enriched by some notions of genetic algorithm. In pure genetic algorithm for grammar induction, randomly generated initial population make it computationally expensive, so we used inside-outside algorithm to generate initial population. Our experiments show that our approach's result is better than other applied methods for Persian grammar induction.

[1]  Menno van Zaanen,et al.  Bootstrapping structure into language : alignment-based learning , 2001, ArXiv.

[2]  Dan Klein,et al.  Distributional phrase structure induction , 2001, CoNLL.

[3]  Pieter W. Adriaans,et al.  Learning Shallow Context-free Languages under Simple Distributions , 2001 .

[4]  Yves Schabes,et al.  Parsing the Wall Street Journal with the Inside-Outside Algorithm , 1993, EACL.

[5]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[6]  Menno van Zaanen,et al.  ABL: Alignment-Based Learning , 2000, COLING.

[7]  Menno van Zaanen,et al.  Comparing Two Unsupervised Grammar Induction Systems: Alignment-Based Learning vs. EMILE , 2001 .

[8]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[9]  Eric Brill,et al.  A corpus-based approach to language learning , 1993 .

[10]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[11]  C. de Marcken On the Unsupervised Induction of Phrase-Structure Grammars , 1999 .

[12]  Alexander Clark,et al.  Unsupervised Language Acquisition: Theory and Practice , 2002, ArXiv.

[13]  Heshaam Faili,et al.  An Application of Lexicalized Grammars in English-Persian Translation , 2004, ECAI.

[14]  Deniz Yuret,et al.  Discovery of linguistic relations using lexical attraction , 1998, ArXiv.

[15]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[16]  Rafael C. Carrasco,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 2009 .

[17]  Menno van Zaanen ABL: Alignment-Based Learning , 2000, COLING.

[18]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[19]  Dan Klein,et al.  Natural Language Grammar Induction Using a Constituent-Context Model , 2001, NIPS.

[20]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[21]  Dan Klein Unsupervised Learning for Natural Language Processing , 2008, COLT.

[22]  Alexander Clark Unsupervised induction of stochastic context-free grammars using distributional clustering , 2001, CoNLL.

[23]  Stanley F. Chen,et al.  Bayesian Grammar Induction for Language Modeling , 1995, ACL.

[24]  F.,et al.  Learning of Stochastic Context-free Grammars from Bracketed Corpora by Means of Reestimation Algorithms , 1999 .

[25]  Heshaam Faili,et al.  Unsupervised grammar induction using history based approach , 2006, Comput. Speech Lang..

[26]  Mark A. Paskin,et al.  Grammatical Bigrams , 2001, NIPS.

[27]  Christopher D. Manning,et al.  The unsupervised learning of natural language structure , 2005 .

[28]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[29]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[30]  Ted Briscoe,et al.  Robust stochastic parsing using the inside-outside algorithm , 1994, ArXiv.

[31]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.