MKL-tree: an index structure for high-dimensional vector spaces

In this work, a novel hierarchical data structure for high dimensional data indexing is proposed. MKL-tree is based on dimensionality reduction operated by means of the MKL transform, a multi-space generalization of the KL transform. A local dimensionality reduction is performed at each node of the tree, allowing more selective features to be extracted and thus increasing the discriminating power of the index. The mathematical foundation for nodes and leaves representation and for the techniques aimed to manage the structure is detailed. Moreover, the algorithms for bulk loading MKL-tree (i.e., for creating the tree given a large number of objects simultaneously), for updating and splitting nodes after the insertion of new objects and for performing similarity searches are described. Results are reported for the comparison of MKL-tree with other well-known access methods in terms of I/O and CPU costs and precision of the result in the execution of similarity queries.

[1]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[2]  Thomas S. Huang,et al.  Supporting similarity queries in MARS , 1997, MULTIMEDIA '97.

[3]  Bernhard Seeger,et al.  A Generic Approach to Bulk Loading Multidimensional Index Structures , 1997, VLDB.

[4]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[5]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[6]  Christos Faloutsos,et al.  The TV-tree: An index structure for high-dimensional data , 1994, The VLDB Journal.

[7]  R. H. Wilcox Adaptive control processes—A guided tour, by Richard Bellman, Princeton University Press, Princeton, New Jersey, 1961, 255 pp., $6.50 , 1961 .

[8]  Alessandra Lumini,et al.  Eigenspace merging for model updating , 2002, Object recognition supported by user interaction for service robots.

[9]  Patrick M Kelly An Algorithm for Merging Hyperellipsoidal Clusters , 1994 .

[10]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[11]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Sharad Mehrotra,et al.  The hybrid tree: an index structure for high dimensional feature spaces , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[13]  Ambuj K. Singh,et al.  Dimensionality Reduction for Similarity Searching in Dynamic Databases , 1999, Comput. Vis. Image Underst..

[14]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[15]  Chin-Wan Chung,et al.  The GC-tree: a high-dimensional index structure for similarity search in image databases , 2002, IEEE Trans. Multim..

[16]  Xiaoming Zhu,et al.  An efficient indexing method for nearest neighbor searches in high-dirnensional image databases , 2002, IEEE Trans. Multim..

[17]  Akhil Kumar G-Tree: A New Data Structure for Organizing Multidimensional Data , 1994, IEEE Trans. Knowl. Data Eng..

[18]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[19]  Masatoshi Yoshikawa,et al.  The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation , 2000, VLDB.

[20]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[21]  Sharad Mehrotra,et al.  Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces , 2000, VLDB.

[22]  Alexander Thomasian,et al.  CSVD: Clustering and Singular Value Decomposition for Approximate Similarity Search in High-Dimensional Spaces , 2003, IEEE Trans. Knowl. Data Eng..

[23]  Gautam Biswas,et al.  Unsupervised Learning with Mixed Numeric and Nominal Data , 2002, IEEE Trans. Knowl. Data Eng..

[24]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[25]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[26]  Divyakant Agrawal,et al.  Vector approximation based indexing for non-uniform high dimensional data sets , 2000, CIKM '00.

[27]  H. Buchner The Grid File : An Adaptable , Symmetric Multikey File Structure , 2001 .

[28]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[29]  Christian Böhm,et al.  Efficient Bulk Loading of Large High-Dimensional Indexes , 1999, DaWaK.

[30]  Beng Chin Ooi,et al.  Contorting high dimensional data for efficient main memory KNN processing , 2003, SIGMOD '03.

[31]  Aidong Zhang,et al.  ClusterTree: Integration of Cluster Representation and Nearest-Neighbor Search for Large Data Sets with High Dimensions , 2003, IEEE Trans. Knowl. Data Eng..

[32]  Edward Y. Chang,et al.  Clustering for Approximate Similarity Search in High-Dimensional Spaces , 2002, IEEE Trans. Knowl. Data Eng..

[33]  Dario Maio,et al.  Similarity search using multi-space KL , 1999, Proceedings. Tenth International Workshop on Database and Expert Systems Applications. DEXA 99.

[34]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[35]  Keinosuke Fukunaga,et al.  Statistical Pattern Recognition , 1993, Handbook of Pattern Recognition and Computer Vision.

[36]  David Salomon,et al.  Data Compression: The Complete Reference , 2006 .

[37]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[38]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[39]  Christian Böhm,et al.  Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[40]  Dario Maio,et al.  Multispace KL for Pattern Representation and Classification , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[42]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[43]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.