Metric information filtering

The traditional problem of similarity search requires to find, within a set of points, those that are closer to a query point q, according to a distance function d. In this paper we introduce the novel problem of metric information filtering (MIF): in this scenario, each point x"i comes with its own distance function d"i and the task is to efficiently determine those points that are close enough, according to d"i, to a query point q. MIF can be seen as an extension of both the similarity search problem and of approaches currently used in content-based information filtering, since in MIF user profiles (points) and new items (queries) are compared using arbitrary, personalized, metrics. We introduce the basic concepts of MIF and provide alternative resolution strategies aiming to reduce processing costs. Our experimental results show that the proposed solutions are indeed effective in reducing evaluation costs.

[1]  Mihai Badoiu,et al.  Approximation algorithms for low-distortion embeddings into low-dimensional spaces , 2005, SODA '05.

[2]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[3]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[4]  Marco Patella,et al.  Searching in metric spaces with user-defined and approximate distances , 2002, TODS.

[5]  C. Lee Giles,et al.  Self-adaptive user profiles for large-scale data delivery , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[6]  Pavel Zezula,et al.  A cost model for similarity queries in metric spaces , 1998, PODS '98.

[7]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[8]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[9]  Manolis Vavalis,et al.  What Happened to Content-Based Information Filtering? , 2009, ICTIR.

[10]  Peretz Shoval,et al.  Information Filtering: Overview of Issues, Research and Systems , 2001, User Modeling and User-Adapted Interaction.

[11]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[12]  E. Ruiz An algorithm for finding nearest neighbours in (approximately) constant average time , 1986 .

[13]  J. Jacoby Perspectives on Information Overload , 1984 .

[14]  Donald Kossmann,et al.  Batched processing for information filters , 2005, 21st International Conference on Data Engineering (ICDE'05).

[15]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[16]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[17]  Christos Faloutsos,et al.  Searching Multimedia Databases by Content , 1996, Advances in Database Systems.

[18]  Hector Garcia-Molina,et al.  The SIFT information dissemination system , 1999, TODS.

[19]  Ilaria Bartolini,et al.  FeedbackBypass: A New Approach to Interactive Similarity Query Processing , 2001, VLDB.

[20]  Pavel Zezula,et al.  Similarity Search: The Metric Space Approach (Advances in Database Systems) , 2005 .