论文信息 - Exploiting Geometry for Support Vector Machine Indexing - 字舞流文

Exploiting Geometry for Support Vector Machine Indexing

Support Vector Machines (SVMs) have been adopted by many data-mining and information-retrieval applications for learning a mining or query concept, and then retrieving the “top-k” best matches to the concept. However, when the dataset is large, naively scanning the entire dataset to find the top matches is not scalable. In this work, we propose a kernel indexing strategy to substantially prune the search space and thus improve the performance of top-k queries. Our kernel indexer (KDX) takes advantage of the underlying geometric properties and quickly converges on an approximate set of top-k instances of interest. More importantly, once the kernel (e.g., Gaussian kernel) has been selected and the indexer has been constructed, the indexer can work with different kernel-parameter settings (e.g., γ and σ) without performance compromise. Through theoretical analysis, and empirical studies on a wide variety of datasets, we demonstrate KDX to be very effective.

Edward Y. Chang | Navneet Panda | E. Chang | Navneet Panda

[1] Edward Y. Chang,et al. CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[2] Shin'ichi Satoh,et al. The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[3] Pavel Zezula,et al. M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[4] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[5] Hyunsoo Kim,et al. Dimension Reduction in Text Classification with Support Vector Machines , 2005, J. Mach. Learn. Res..

[6] Daphne Koller,et al. Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[7] Stefan Berchtold,et al. High-Dimensional Index Structures : Databases Support for Next Decade's Applications's , 2000, ICDE 2000.

[8] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[9] Christopher J. C. Burges,et al. Geometry and invariance in kernel based methods , 1999 .

[10] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[11] Hans-Jörg Schek,et al. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[12] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13] Daphne Koller,et al. Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[14] Edward Y. Chang,et al. Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[15] Christos Faloutsos,et al. The TV-tree: An index structure for high-dimensional data , 1994, The VLDB Journal.

[16] M. S. Brown,et al. Support Vector Machine Classification of Microarray from Gene Expression Data , 1999 .

[17] Hans-Peter Kriegel,et al. The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[18] Hans-Peter Kriegel,et al. The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[19] Philip S. Yu,et al. Outlier detection for high dimensional data , 2001, SIGMOD '01.

[20] Jun Sakuma,et al. Fast approximate similarity search in extremely high-dimensional data sets , 2005, 21st International Conference on Data Engineering (ICDE'05).