RAF: An Activation Framework for Refining Similarity Queries Using Learning Techniques

In numerous applications that deal with similarity search, a user may not have an exact specification of his information need and/or may not be able to formulate a query that exactly captures his notion of similarity. A promising approach to mitigate this problem is to enable the user to submit a rough approximation of the desired query and use relevance feedback on retrieved objects to refine the query. In this paper, we explore such a refinement strategy for a general class of structured similarity queries. Our approach casts the refinement problem as that of learning concepts using the tuples on which the user provides feedback as a labeled training set. Under this setup, similarity query refinement consists of two learning tasks: learning the structure of the query and learning the relative importance of query components. The paper develops machine learning approaches suitable for the two learning tasks. The primary contribution of the paper is the Refinement Activation Framework (RAF) that decides when each learner is invoked. Experimental analysis over many real life datasets shows that our strategy significantly outperforms existing approaches in terms of retrieval quality.

[1]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[2]  Gerard Salton,et al.  The use of extended Boolean logic in information retrieval , 1984, SIGMOD '84.

[3]  Ronald Fagin,et al.  Combining fuzzy information from multiple systems (extended abstract) , 1996, PODS.

[4]  Sharad Mehrotra,et al.  Similarity Search Using Multiple Examples in MARS , 1999, VISUAL.

[5]  Sharad Mehrotra,et al.  An Approach to Integrating Query Refinement in SQL , 2002, EDBT.

[6]  Thomas S. Huang,et al.  Content-based image retrieval with relevance feedback in MARS , 1997, Proceedings of International Conference on Image Processing.

[7]  Thomas S. Huang,et al.  Supporting content-based queries over images in MARS , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[8]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[9]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[10]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[11]  O. Mangasarian,et al.  Pattern Recognition Via Linear Programming: Theory and Application to Medical Diagnosis , 1989 .

[12]  Christos Faloutsos,et al.  MindReader: Querying Databases Through Multiple Examples , 1998, VLDB.

[13]  R. Mooney Encouraging Experimental Results on Learning CNF , 1995 .

[14]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[15]  R. Michalski,et al.  Multistrategy Constructive Induction: AQ17-MCI , 1993 .

[16]  Thomas S. Huang,et al.  Supporting Ranked Boolean Similarity Queries in MARS , 1998, IEEE Trans. Knowl. Data Eng..

[17]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[18]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[19]  Christos Faloutsos,et al.  FALCON: Feedback Adaptive Loop for Content-Based Retrieval , 2000, VLDB.