论文信息 - Information retrieval based on context distance and morphology

Information retrieval based on context distance and morphology

We present an approach to information retrieval based on context distance and morphology. Context distance is a measure we use to assess the closeness of word meanings. This context distance model measures semantic distances between words using the local contexts of words within a single document as well as the lexical co-occurrence information in the set of documents to be retrieved. We also propose to integrate the context distance model with morphological analysis in determining word similarity so that the two can enhance each other. Using the standard vector-space model, we evaluated the proposed method on a subset of TREC-4 corpus (AP88 and AP90 collection, 158,240 documents, 49 queries). Results show that this method improves the 11-point average precision by 8.6%.

Evelyne Tzoukermann | Hongyan Jing | E. Tzoukermann | Hongyan Jing

[1] Kenneth Ward Church. One term or two? , 1995, SIGIR '95.

[2] W. Bruce Croft,et al. Query expansion using local and global document analysis , 1996, SIGIR '96.

[3] Robert Krovetz,et al. Viewing morphology as an inference process , 1993, Artif. Intell..

[4] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .

[5] Hinrich Schütze,et al. Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[6] David Yarowsky,et al. Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[7] Hinrich Schütze,et al. Information retrieval based on word senses , 1995 .

[8] James Allan,et al. Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[9] Julie Beth Lovins,et al. Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[10] Hwee Tou Ng,et al. Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[11] Susan McRoy,et al. Using Multiple Knowledge Sources for Word Sense Discrimination , 1992, Comput. Linguistics.

[12] David Yarowsky,et al. One Sense Per Discourse , 1992, HLT.

[13] Hang Li,et al. Word Clustering and Disambiguation Based on Co-occurrence Data , 1998, COLING.

[14] David A. Hull. Stemming algorithms: a case study for detailed evaluation , 1996 .

[15] Adam Kilgarriff,et al. Dictionary word sense distinctions: An enquiry into their nature , 1992, Comput. Humanit..

[16] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.