Fourier domain scoring: a novel document ranking method

Current document retrieval methods use a vector space similarity measure to give scores of relevance to documents when related to a specific query. The central problem with these methods is that they neglect any spatial information within the documents in question. We present a new method, called Fourier Domain Scoring (FDS), which takes advantage of this spatial information, via the Fourier transform, to give a more accurate ordering of relevance to a document set. We show that FDS gives an improvement in precision over the vector space similarity measures for the common case of Web like queries, and it gives similar results to the vector space measures for longer queries.

[1]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[2]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[3]  Rick Kazman,et al.  WebQuery: Searching and Visualizing the Web Through Connectivity , 1997, Comput. Networks.

[4]  Ellen M. Voorhees,et al.  The Eighth Text REtrieval Conference (TREC-8) , 2000 .

[5]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[6]  Ellen Spertus,et al.  ParaSite: Mining Structural Information on the Web , 1997, Comput. Networks.

[7]  Marimuthu Palaniswami,et al.  Internet Document Filtering Using Fourier Domain Scoring , 2001, PKDD.

[8]  Chris Buckley,et al.  New Retrieval Approaches Using SMART: TREC 4 , 1995, TREC.

[9]  David Hawking,et al.  Proximity Operators - So Near And Yet So Far , 1995, TREC.

[10]  Sabine Buchholz,et al.  Using Grammatical Relations, Answer Frequencies and the World Wide Web for TREC Question Answering , 2001, TREC.

[11]  Chris Buckley,et al.  SMART in TREC 8 , 1999, Text Retrieval Conference.

[12]  Alistair Moffat,et al.  Exploring the similarity space , 1998, SIGF.

[13]  Kelly Maglaughlin,et al.  IRIS at TREC-8 , 1999, TREC.

[14]  Xindong Wu,et al.  SiteHelper: A Localized Agent That Helps Incremental Exploration of the World Wide Web , 1997, Comput. Networks.

[15]  David S. Ebert,et al.  Two-Handed Volumetric Document Corpus Management , 1997, IEEE Computer Graphics and Applications.

[16]  Massimo Marchiori,et al.  The Quest for Correct Information on the Web: Hyper Search Engines , 1997, Comput. Networks.

[17]  Stephen E. Robertson,et al.  Okapi/Keenbow at TREC-8 , 1999, TREC.

[18]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[19]  Ian H. Witten,et al.  Managing gigabytes (2nd ed.): compressing and indexing documents and images , 1999 .

[20]  Donna K. Harman,et al.  Overview of the Fourth Text REtrieval Conference (TREC-4) , 1995, TREC.

[21]  James Allan,et al.  INQUERY and TREC-8 , 1998, TREC.

[22]  Oren Etzioni,et al.  Moving Up the Information Food Chain: Deploying Softbots on the World Wide Web , 1996, AI Mag..

[23]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .