A Graph-Based, Metric Space Proximity Calculator for Internet Objects

Proximity calculations in metric space have found new applications recently with the increased emphasis on Internet searching. In this work we present a novel approach to Internet searching. We use a combination of metric space distance calculations and link analysis to determine the proximity of internet objects. This framework is generic and can be applied to various domains such as help desk support, human resources or geographical information systems. We provide an example within the text mining domain. We also provide the mathematical preliminaries, give a detailed example and discuss our results. MOTS-CLES : recherche sur Internet, espace metrique, mesurer la distance, applications des graphes.

[1]  J. B. Rosen,et al.  Lower dimensional representation of text data in vector space based information retrieval , 2001 .

[2]  Chris H. Q. Ding,et al.  A probabilistic model for Latent Semantic Indexing , 2005, J. Assoc. Inf. Sci. Technol..

[3]  Chris Ding,et al.  A probabilistic model for latent semantic indexing in information retrieval and filtering , 2001 .

[4]  Shlomo Moran,et al.  SALSA: the stochastic approach for link-structure analysis , 2001, TOIS.

[5]  Axel Ruhe,et al.  Information retrieval using very short Krylov sequences , 2001 .

[6]  Paul Van Dooren,et al.  An incremental method for computing dominant singular spaces , 2001 .

[7]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[8]  William M. Pottenger,et al.  Detecting emerging concepts in textual data mining , 2001 .

[9]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[10]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[11]  E. Garfield Citation analysis as a tool in journal evaluation. , 1972, Science.

[12]  Yuan-Jye Jason Wu,et al.  Information retrieval and classification with subspace representations , 2001 .

[13]  Michael E. D. Koenig,et al.  Journal clustering using a bibliographic coupling method , 1977, Inf. Process. Manag..

[14]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[15]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[16]  Dan Klein,et al.  Evaluating strategies for similarity search on the web , 2002, WWW '02.

[17]  Jacob Kogan Clustering large unstructured document sets , 2001 .

[18]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[19]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.