论文信息 - Scaling Context Space

Scaling Context Space

Context is used in many NLP systems as an indicator of a term's syntactic and semantic function. The accuracy of the system is dependent on the quality and quantity of contextual information available to describe each term. However, the quantity variable is no longer fixed by limited corpus resources. Given fixed training time and computational resources, it makes sense for systems to invest time in extracting high quality contextual information from a fixed corpus. However, with an effectively limitless quantity of text available, extraction rate and representation size need to be considered. We use thesaurus extraction with a range of context extracting tools to demonstrate the interaction between context quantity, time and size on a corpus of 300 million words.

James R. Curran | Marc Moens

[1] Gregory Grefenstette,et al. Explorations in automatic thesaurus discovery , 1994 .

[2] James R. Curran,et al. Improvements in Automatic Thesaurus Extraction , 2002, ACL 2002.

[3] Sharon A. Caraballo. Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[4] Dekang Lin,et al. An Information-Theoretic Definition of Similarity , 1998, ICML.

[5] Michele Banko,et al. Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.

[6] John A. Carroll,et al. Robust, applied morphological generation , 2000, INLG.

[7] Carolyn J. Crouch,et al. A cluster-based approach to thesaurus construction , 1988, SIGIR '88.

[8] Stephen Clark,et al. Class-based probability estimation using a semantic hierarchy , 2001, HTL 2001.

[9] B. V. Verghese,et al. Thesaurus of English Words and Phrases , 2002 .

[10] Steven P. Abney. Partial parsing via finite-state cascades , 1996, Natural Language Engineering.

[11] W. Bruce Croft,et al. Deriving concept hierarchies from text , 1999, SIGIR '99.