论文信息 - Building a Large Annotated Corpus of English: The Penn Treebank

Building a Large Annotated Corpus of English: The Penn Treebank

Abstract : As a result of this grant, the researchers have now published oil CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, with over 3 million words of that material assigned skeletal grammatical structure. This material now includes a fully hand-parsed version of the classic Brown corpus. About one half of the papers at the ACL Workshop on Using Large Text Corpora this past summer were based on the materials generated by this grant.

Beatrice Santorini | Mitchell P. Marcus | Mary Ann Marcinkiewicz | M. Marcus | Beatrice Santorini

[1] W. Stewart. Church , 1962, Encyclopedic Dictionary of Archaeology.

[2] James J. Wrenn. A Standard Sample of Present-Day Chinese for Use with Digital Computers. Final Report. , 1974 .

[3] Bryan J. Hubbell,et al. On memory limitations in natural language processing , 1982 .

[4] W. Nelson Francis,et al. FREQUENCY ANALYSIS OF ENGLISH USAGE: LEXICON AND GRAMMAR , 1983 .

[5] C. Chapelle. The Computational Analysis of English—A Corpus‐Based Approach , 1988 .

[6] Donald Hindle,et al. Acquiring Disambiguation Rules from Text , 1989, ACL.

[7] Kenneth Ward Church. A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[8] Eric Brill,et al. Deducing Linguistic Structure from the Statistics of Large Corpora , 1990, HLT.

[9] Mitchell P. Marcus,et al. Parsing a Natural Language Using Mutual Information Statistics , 1990, AAAI.

[10] Beatrice Santorini,et al. Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision) , 1990 .

[11] Beatrice Santorini. Part-of-speech tagging guidelines for the penn treebank project , 1990 .

[12] Eric Brill,et al. Deducing linguistic structure from the statistics of large corpora , 1990 .

[13] Ralph M. Weischedel,et al. Partial Parsing: A Report on Work in Progress , 1991, HLT.

[14] Eric Brill,et al. Discovering the Lexical Features of a Language , 1991, ACL.

[15] Richard M. Schwartz,et al. Studies in Part of Speech Labelling , 1991, HLT.

[16] Fernando Pereira,et al. Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[17] N. M. Veilleuz,et al. Probabilistic Parse Scoring Based on Prosodic Phrasing , 1992, HLT.

[18] F. Pereira,et al. Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, ACL.