Efficient top-k hyperplane query processing for multimedia information retrieval

A query can be answered by a binary classifier, which separates the instances that are relevant to the query from the ones that are not. When kernel methods are employed to train such a classifier, the class boundary is represented as a hyperplane in a projected space. Data instances that are farthest from the hyperplane are deemed to be most relevant to the query, and that are nearest to the hyperplane to be most uncertain to the query. In this paper, we address the twin problems of efficient retrieval of the approximate set of instances (a) farthest from and (b) nearest to a query hyperplane. Retrieval of instances for this hyperplane-based query scenario is mapped to the range-query problem allowing for the reuse of existing index structures. Empirical evaluation on large image datasets confirms the effectiveness of our approach.

[1]  Marco Patella,et al.  PAC nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[2]  Howard D. Wactlar,et al.  Putting active learning into multimedia applications: dynamic definition and refinement of concept classifiers , 2005, MULTIMEDIA '05.

[3]  Milind R. Naphade,et al.  Learning the semantics of multimedia queries and concepts from a small number of examples , 2005, MULTIMEDIA '05.

[4]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[5]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[6]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[7]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[8]  Wei-Ying Ma,et al.  Learning a semantic space from user's relevance feedback for image retrieval , 2003, IEEE Trans. Circuits Syst. Video Technol..

[9]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[11]  Hanan Samet,et al.  The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[14]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[15]  Stefan Berchtold,et al.  High-Dimensional Index Structures : Databases Support for Next Decade's Applications's , 2000, ICDE 2000.

[16]  B. E. Eckbo,et al.  Appendix , 1826, Epilepsy Research.

[17]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[18]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[19]  Edward Y. Chang,et al.  Exploiting Geometry for Support Vector Machine Indexing , 2005, SDM.

[20]  Kiri Wagstaff,et al.  Alpha seeding for support vector machines , 2000, KDD '00.

[21]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[22]  Jun Sakuma,et al.  Fast approximate similarity search in extremely high-dimensional data sets , 2005, 21st International Conference on Data Engineering (ICDE'05).

[23]  Christos Faloutsos,et al.  The TV-tree: An index structure for high-dimensional data , 1994, The VLDB Journal.

[24]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[25]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[26]  Wei-Ying Ma,et al.  Learning and inferring a semantic space from user's relevance feedback for image retrieval , 2002, MULTIMEDIA '02.

[27]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[28]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[29]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.