A Document Retrieval Model Based on Digital Signal Filtering

Information retrieval (IR) systems are designed, in general, to satisfy the information need of a user who expresses it by means of a query, by providing him with a subset of documents selected from a collection and ordered by decreasing relevance to the query. Such systems are based on IR models, which define how to represent the documents and the query, as well as how to determine the relevance of a document for a query. In this article, we present a new IR model based on concepts taken from both IR and digital signal processing (like Fourier analysis of signals and filtering). This allows the whole IR process to be seen as a physical phenomenon, where the query corresponds to a signal, the documents correspond to filters, and the determination of the relevant documents to the query is done by filtering that signal. Tests showed that the quality of the results provided by this IR model is comparable with the state-of-the-art.

[1]  Sophie Ahrens,et al.  Recommender Systems , 2012 .

[2]  Lida Xu,et al.  The internet of things: a survey , 2014, Information Systems Frontiers.

[3]  Marimuthu Palaniswami,et al.  A novel document retrieval method using the discrete wavelet transform , 2005, TOIS.

[4]  Alan V. Oppenheim,et al.  Discrete-time signal processing (2nd ed.) , 1999 .

[5]  Massimo Melucci,et al.  Vector Space Model , 2019, Syntactic n-grams in Computational Linguistics.

[6]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[7]  W. Press,et al.  Numerical Recipes in C++: The Art of Scientific Computing (2nd edn)1 Numerical Recipes Example Book (C++) (2nd edn)2 Numerical Recipes Multi-Language Code CD ROM with LINUX or UNIX Single-Screen License Revised Version3 , 2003 .

[8]  Éric Gaussier,et al.  Information-based models for ad hoc IR , 2010, SIGIR '10.

[9]  Justin Zobel,et al.  Passage retrieval revisited , 1997, SIGIR '97.

[10]  Alan V. Oppenheim,et al.  Signals & systems (2nd ed.) , 1996 .

[11]  Peter Schäuble,et al.  Document and passage retrieval based on hidden Markov models , 1994, SIGIR '94.

[12]  Thomas Roelleke Information Retrieval Models: Foundations & Relationships , 2013, Information Retrieval Models: Foundations & Relationships.

[13]  David Horn,et al.  Dynamic quantum clustering: a method for visual exploration of structures in data , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[15]  Ling Liu,et al.  Encyclopedia of Database Systems , 2009, Encyclopedia of Database Systems.

[16]  Ron Sacks-Davis,et al.  Efficient passage ranking for document databases , 1999, TOIS.

[17]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[18]  W. Bruce Croft,et al.  Latent concept expansion using markov random fields , 2007, SIGIR.

[19]  F. Harris On the use of windows for harmonic analysis with the discrete Fourier transform , 1978, Proceedings of the IEEE.

[20]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[21]  C. J. van Rijsbergen,et al.  Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[22]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[23]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[24]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[25]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[26]  C. J. van Rijsbergen,et al.  The geometry of information retrieval , 2004 .

[27]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[28]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[29]  Sabrina Hirsch,et al.  Digital Signal Processing A Computer Based Approach , 2016 .

[30]  Wei-Ying Ma,et al.  Gravitation-based model for information retrieval , 2005, SIGIR '05.

[31]  Isaac E. Lagaris,et al.  Newtonian clustering: An approach based on molecular dynamics and global optimization , 2007, Pattern Recognit..

[32]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[33]  Yi-Cheng Zhang,et al.  Heat conduction process on community networks as a recommendation model. , 2007, Physical review letters.

[34]  Gerard Salton,et al.  Automatic text decomposition using text segments and text themes , 1996, HYPERTEXT '96.

[35]  Peter Schäuble,et al.  Improving a Basic Retrieval Method by Links and Passage Level Evidence , 1994, TREC.

[36]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[37]  C. J. van Rijsbergen,et al.  Quantum Mechanics and Information Retrieval , 2011, Advanced Topics in Information Retrieval.

[38]  W. Bruce Croft,et al.  Passage retrieval based on language models , 2002, CIKM '02.

[39]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[40]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[41]  Tao Tao,et al.  Diagnostic Evaluation of Information Retrieval Models , 2011, TOIS.

[42]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[43]  Stephen E. Robertson,et al.  Optimisation methods for ranking functions with multiple parameters , 2006, CIKM '06.

[44]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[45]  S. Robertson The probability ranking principle in IR , 1997 .

[46]  Thomas Roelleke,et al.  IR Models: Foundations and Relationships , 2013, ICTIR.

[47]  C H Yeung,et al.  Dynamics of movie competition and popularity spreading in recommender systems. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[48]  Massimo Melucci,et al.  An Information Retrieval Model Based on Discrete Fourier Transform , 2010, IRFC.

[49]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.