Using Term Location Information to Enhance Probabilistic Information Retrieval

Nouns are more important than other parts of speech in information retrieval and are more often found near the beginning or the end of sentences. In this paper, we investigate the effects of rewarding terms based on their location in sentences on information retrieval. Particularly, we propose a novel Term Location (TEL) retrieval model based on BM25 to enhance probabilistic information retrieval, where a kernel-based method is used to capture term placement patterns. Experiments on five TREC datasets of varied size and content indicate the proposed model significantly outperforms the optimized BM25 and DirichletLM in MAP over all datasets with all kernel functions, and excels the optimized BM25 and DirichletLM over most of the datasets in P@5 and P@20 with different kernel functions.

[1]  Eric SanJuan,et al.  Annotation of Scientific Summaries for Information Retrieval , 2011, ESAIR 2011.

[2]  Jen-Shin Hong,et al.  Web mining for event-based commonsense knowledge using lexico-syntactic pattern matching and semantic role labeling , 2010, Expert Syst. Appl..

[3]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[4]  ChengXiang Zhai,et al.  Noun-Phrase Analysis in Unrestricted Text for Information Retrieval , 1996, ACL.

[5]  Olga Vechtomova,et al.  Noun phrases in interactive query expansion and document ranking , 2006, Information Retrieval.

[6]  ChengXiang Zhai,et al.  Positional language models for information retrieval , 2009, SIGIR.

[7]  Clement T. Yu,et al.  An effective approach to document retrieval via utilizing WordNet and recognizing phrases , 2004, SIGIR '04.

[8]  W. Bruce Croft,et al.  An Association Thesaurus for Information Retrieval , 1994, RIAO.

[9]  Hong-Gee Kim,et al.  Exploiting noun phrases and semantic relationships for text document clustering , 2009, Inf. Sci..

[10]  Xiangji Huang,et al.  Modeling Term Associations for Probabilistic Information Retrieval , 2014, TOIS.

[11]  Stefano Giglio,et al.  Hard times. , 1994, American journal of hospital pharmacy.

[12]  Ann Hogue,et al.  The Essentials of English: A Writer's Handbook , 2003 .

[13]  Dan Yang,et al.  A Natural Language Processing and Semantic-Based System for Contract Analysis , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[14]  Craig MacDonald,et al.  Overview of the TREC 2006 Blog Track , 2006, TREC.

[15]  C G Chute,et al.  Effectiveness of Lexico-syntactic Pattern Matching for Ontology Enrichment with Clinical Documents , 2010, Methods of Information in Medicine.

[16]  Xiangji Huang,et al.  Rewarding term location information to enhance probabilistic information retrieval , 2012, SIGIR '12.

[17]  Charles F. Meyer,et al.  Introducing English Linguistics , 2009 .