An Information Retrieval Model Based on Discrete Fourier Transform

Information Retrieval (IR) systems combine a variety of techniques stemming from logical, vector-space and probabilistic models. This variety of combinations has produced a significant increase in retrieval effectiveness since early 1990s. Nevertheless, the quest for new frameworks has not been less intense than the research in the optimization and experimentation of the most common retrieval models. This paper presents a new framework based on Discrete Fourier Transform (DFT) for IR. Basically, this model represents a query term as a sine curve and a query is the sum of sine curves, thus it acquires an elegant and sound mathematical form. The sinusoidal representation of the query is transformed from the time domain to the frequency domain through DFT. The result of the DFT is a spectrum. Each document of the collection corresponds to a set of filters and the retrieval operation corresponds to filtering the spectrum – for each document the spectrum is filtered and the result is a power. Hence, the documents are ranked by the power of the spectrum such that the more the document decreases the power of the spectrum, the higher the rank of the document. This paper is mainly theoretical and the retrieval algorithm is reported to suggest the feasibility of the proposed model. Some small-scale experiments carried out for testing the effectiveness of the algorithm indicate a performance comparable to the state-of-the-art.

[1]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[2]  Alan V. Oppenheim,et al.  Signals & systems (2nd ed.) , 1996 .

[3]  C. J. van Rijsbergen,et al.  The geometry of information retrieval , 2004 .

[4]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[5]  Norbert Fuhr,et al.  A probability ranking principle for interactive information retrieval , 2008, Information Retrieval.

[6]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[7]  C. Howson Theories of Probability , 1995 .

[8]  Christopher C. Yang Search Engines Information Retrieval in Practice , 2010 .

[9]  W. Bruce Croft,et al.  Inference networks for document retrieval , 1989, SIGIR '90.

[10]  Marimuthu Palaniswami,et al.  A novel document retrieval method using the discrete wavelet transform , 2005, TOIS.

[11]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[12]  J. Hartigan Theories of Probability , 1983 .

[13]  William S. Cooper,et al.  Getting beyond Boole , 1988, Inf. Process. Manag..

[14]  Gerard Salton,et al.  Mathematics and Information Retrieval , 1979, J. Documentation.

[15]  W. Press,et al.  Numerical Recipes in C++: The Art of Scientific Computing (2nd edn)1 Numerical Recipes Example Book (C++) (2nd edn)2 Numerical Recipes Multi-Language Code CD ROM with LINUX or UNIX Single-Screen License Revised Version3 , 2003 .

[16]  Marimuthu Palaniswami,et al.  Fourier domain scoring: a novel document ranking method , 2004, IEEE Transactions on Knowledge and Data Engineering.

[17]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[18]  C. J. van Rijsbergen,et al.  A Non-Classical Logic for Information Retrieval , 1997, Comput. J..

[19]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[20]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[21]  Stephen E. Robertson,et al.  Salton Award Lecture on theoretical argument in information retrieval , 2000, SIGF.

[22]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[23]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[24]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[25]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[26]  Mark Levene,et al.  Search Engines: Information Retrieval in Practice , 2011, Comput. J..

[27]  Michael D. Gordon,et al.  A utility theoretic examination of the probability ranking principle in information retrieval , 1991, J. Am. Soc. Inf. Sci..

[28]  William H. Press,et al.  Book-Review - Numerical Recipes in Pascal - the Art of Scientific Computing , 1989 .

[29]  S. Robertson The probability ranking principle in IR , 1997 .