论文信息 - Statistical Filtering and Subcategorization Frame Acquisition

Statistical Filtering and Subcategorization Frame Acquisition

Research into the automatic acquisition of subcategorization frames (SCFs) from corpora is starting to produce large-scale computational lexicons which include valuable frequency information. However, the accuracy of the resulting lexicons shows room for improvement. One significant source of error lies in the statistical filtering used by some researchers to remove noise from automatically acquired subcategorization frames. In this paper, we compare three different approaches to filtering out spurious hypotheses. Two hypothesis tests perform poorly, compared to filtering frames on the basis of relative frequency. We discuss reasons for this and consider directions for future research.

Anna Korhonen | Diana McCarthy | Genevieve Gorrell

[1] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2] Frederick B. Thompson,et al. English for the computer , 1899, AFIPS '66 (Fall).

[3] Ralph Grishman,et al. Comlex Syntax: Building a Computational Lexicon , 1994, COLING.

[4] Francesc Ribas,et al. On Learning more Appropriate Selectional Restrictions , 1995, EACL.

[5] Ted Pedersen,et al. Fishing for Exactness , 1996, ArXiv.

[6] Anoop Sarkar,et al. Automatic Extraction of Subcategorization Frames for Czech , 2000, COLING.

[7] Alex Waibel,et al. The Automatic Acquisition of Frequencies of Verb Subcategorization Frames from Tagged Corpora , 2002 .

[8] Ted Briscoe,et al. Automatic Extraction of Subcategorization from Corpora , 1997, ANLP.

[9] Ted Briscoe,et al. The Derivation of a Grammatically Indexed Lexicon from the Longman Dictionary of Contemporary English , 1987, ACL.

[10] G. Leech. 100 million words of English , 1993, English Today.

[11] Gregory P. Knowles,et al. Manual of information to accompany the SEC corpus , 1988 .