论文信息 - Parsing Partially Bracketed Input

Parsing Partially Bracketed Input

A method is proposed to convert a Context Free Grammar to a Bracket Context Free Grammar (BCFG). A BCFG is able to parse input strings which are, in part or whole, annotated with structural information (brackets). Parsing partially bracketed strings arises naturally in several cases. One interesting application is semi-automatic treebank construction. Another application is parsing of input strings which are first annotated by a NP-chunker. Three ways of annotating an input string with structure information are introduced: identifying a complete constituent by using a pair of round brackets, identifying the start or the end of a constituent by using square brackets and identifying the type of a constituent by subscripting the brackets with the type. If an input string is annotated with structural information and is parsed with the BCFG, the number of generated parse trees can be reduced. Only parse trees are generated which comply with the indicated structure. An important non-trivial property of the proposed transformation is that it does not generate spurious ambiguous parse trees.

Mark-Jan Nederhof | Gertjan van Noord | Martijn Wieling

[1] Robert McNaughton,et al. Parenthesis Grammars , 1967, JACM.

[2] Fernando Pereira,et al. Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[3] Gertjan van Noord,et al. Syntactic Annotation of Large Corpora in STEVIN , 2006, LREC.

[4] Seymour Ginsburg,et al. Bracketed Context-Free Languages , 1967, J. Comput. Syst. Sci..

[5] Mitchell P. Marcus,et al. Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[6] Donald E. Knuth,et al. A Characterization of Parenthesis Languages , 1967, Inf. Control..

[7] Gertjan van Noord,et al. The Alpino Dependency Treebank , 2001, CLIN.

[8] Andreas Stolcke,et al. An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities , 1994, CL.