Robust stochastic parsing using the inside-outside algorithm

The paper describes a parser of sequences of (English) part-of-speech labels which utilises a probabilistic grammar trained using the inside-outside algorithm. The initial (meta)grammar is defined by a linguist and further rules compatible with metagrammatical constraints are automatically generated. During training, rules with very low probability are rejected yielding a wide-coverage parser capable of ranking alternative analyses. A series of corpus-based experiments describe the parser's performance.

[1]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[2]  J. Baker Trainable grammars for speech recognition , 1979 .

[3]  John D. Lafferty,et al.  Development and Evaluation of a Broad-Coverage Probabilistic Grammar of English-Language Computer Manuals , 1992, ACL.

[4]  J. H. Wright,et al.  LR parsing of probabilistic grammars with input uncertainty for speech recognition , 1990 .

[5]  Wendy J. Holmes,et al.  Speech Synthesis and Recognition , 1988 .

[6]  Ted Briscoe,et al.  Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars , 1993, CL.

[7]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[8]  Stuart M. Shieber The design of a computer language for linguistic information , 1984 .

[9]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[10]  John Cocke,et al.  Probabilistic Parsing Method for Sentence Disambiguation , 1989, IWPT.

[11]  Kenneth Ward Church,et al.  Coping with Syntactic Ambiguity or How to Put the Block in the Box on the Table , 1982, CL.

[12]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[13]  Geoffrey Leech,et al.  Running a grammar factory: The production of syntactically analysed corpora or “treebanks” , 1991 .

[14]  Ted Briscoe,et al.  The Syntactic Regularity of English Noun Phrases , 1989, EACL.

[15]  S. M Sheiber The design of a computer language for linguistic information coling-84 362--366 , 1984 .

[16]  C. Chapelle The Computational Analysis of English—A Corpus‐Based Approach , 1988 .

[17]  Geoffrey Sampson,et al.  Natural language analysis by stochastic optimization: a progress report on Project APRIL , 1990, J. Exp. Theor. Artif. Intell..

[18]  Ted Briscoe,et al.  A Formalism and Environment for the Development of a Large Grammar of English , 1987, IJCAI.

[19]  Roger Garside,et al.  A Probabilistic Parser , 1985, EACL.

[20]  Claire Grover,et al.  The derivation of a large computational lexicon for English from LDOCE , 1989 .

[21]  Frederick Jelinek,et al.  Markov Source Modeling of Text Generation , 1985 .

[22]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[23]  Geoffrey K. Pullum,et al.  Generalized Phrase Structure Grammar , 1985 .

[24]  John A. Carroll Practical unification-based parsing of Natural Language , 1993 .

[25]  King-Sun Fu,et al.  Syntactic Pattern Recognition And Applications , 1968 .

[26]  O. Firschein,et al.  Syntactic pattern recognition and applications , 1983, Proceedings of the IEEE.

[27]  Chris Mellish,et al.  Some Chart-Based Techniques for Parsing Ill-Formed Input , 1989, ACL.

[28]  Steven J. DeRose,et al.  Grammatical Category Disambiguation by Statistical Optimization , 1988, CL.

[29]  Jhg Wright,et al.  LR Parsing of Probabilistic Grammars for Speech Recognition , 1991 .

[30]  Mitchell P. Marcus,et al.  Pearl: A Probabilistic Chart Parser , 1991, EACL.

[31]  Eiichi Tanaka,et al.  Error-Correcting Parsers for Formal Languages , 1978, IEEE Transactions on Computers.

[32]  R. A. Sharman,et al.  Generating a grammar for statistical training , 1990, HLT.

[33]  Jerry Wright,et al.  Adaptive Probabilistic Generalized LR Parsing , 1991, IWPT.