Improving Search Engines by Demoting Non-Relevant Documents

A good search engine aims to have more relevant documents on the top of the list. This paper describes a new technique called ???Improving search engines by demoting non-relevant documents??? (DNR) that improves the precision by detecting and demoting non-relevant documents. DNR generates a new set of queries that are composed of the terms of the original query combined in different ways. The documents retrieved from those new queries are evaluated using a heuristic algorithm to detect the non-relevant ones. These non-relevant documents are moved down the list which will consequently improve the precision. The new technique is tested on WT2g test collection. The testing of the new technique is done using variant retrieval models, which are the vector model based on the TFIDF weighing measure, the probabilistic models based on the BM25, and DFR-BM25 weighing measures. The recall and precision ratios are used to compare the performance of the new technique against the performance of the original query.

[1]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[2]  Stephen E. Robertson,et al.  Okapi at TREC-4 , 1995, TREC.

[3]  Jiafeng Guo,et al.  Analysis of the Paragraph Vector Model for Information Retrieval , 2016, ICTIR.

[4]  Djoerd Hiemstra,et al.  Information Retrieval Models , 2009, Information Retrieval.

[5]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[6]  Karen Spärck Jones Experiments in relevance weighting of search terms , 1979, Inf. Process. Manag..

[7]  Elizabeth R. Jessup,et al.  Matrices, Vector Spaces, and Information Retrieval , 1999, SIAM Rev..

[8]  Madhu Kumari,et al.  Synonyms Based Term Weighting Scheme: An Extension to TF.IDF , 2016 .

[9]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[10]  Éric Gaussier,et al.  Bridging Language Modeling and Divergence from Randomness Models: A Log-Logistic Model for IR , 2009, ICTIR.

[11]  C. J. van Rijsbergen,et al.  Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[12]  Wessel Kraaij,et al.  Evaluation and analysis of term scoring methods for term extraction , 2016, Information Retrieval Journal.

[13]  Aman Jain,et al.  Information Retrieval using Cosine and Jaccard Similarity Measures in Vector Space Model , 2017 .

[14]  Mark Sanderson,et al.  Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..

[15]  W. Bruce Croft,et al.  Estimating Embedding Vectors for Queries , 2016, ICTIR.

[16]  ZaragozaHugo,et al.  The Probabilistic Relevance Framework , 2009 .

[17]  Claudio Carpineto,et al.  An information-theoretic approach to automatic query expansion , 2001, TOIS.

[18]  J. Wootton,et al.  Search engines. , 1997, Journal of women's health.

[19]  Jiaul H. Paik A novel TF-IDF weighting scheme for effective ranking , 2013, SIGIR.

[20]  Thomas Roelleke,et al.  IR Models: Foundations and Relationships , 2013, ICTIR.

[21]  Samba Ndiaye,et al.  A Novel Term Weighting Scheme Model , 2018, ICFET '18.

[22]  Jati K. Sengupta,et al.  Introduction to Information , 1993 .

[23]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..