Maximal metric margin partitioning for similarity search indexes

We propose a partitioning scheme for similarity search indexes that is called Maximal Metric Margin Partitioning (MMMP). MMMP divides the data on the basis of its distribution pattern, especially for the boundaries of clusters. A partitioning surface created by MMMP is likely to be at maximum distances from the two cluster boundaries. MMMP is the first similarity search index approach to focus on partitioning surfaces and data distribution patterns. We also present an indexing scheme, named the MMMP-Index, which uses MMMP and small ball partitioning. The MMMP-Index prunes many objects that are not relevant to a query, and it reduces the query execution cost. Our experimental results show that MMMP effectively indexes clustered data and reduces the search cost. For clustered vector data, the MMMP-Index reduces the computational cost to less than two thirds that of comparable schemes.

[1]  E. Chavez,et al.  Pivot selection techniques for proximity searching in metric spaces , 2001, SCCC 2001. 21st International Conference of the Chilean Computer Science Society.

[2]  Gonzalo Navarro,et al.  A compact space decomposition for effective metric indexing , 2005, Pattern Recognit. Lett..

[3]  Pavel Zezula,et al.  Similarity Search: The Metric Space Approach (Advances in Database Systems) , 2005 .

[4]  Pavel Zezula,et al.  D-Index: Distance Searching Index for Metric Data Sets , 2003, Multimedia Tools and Applications.

[5]  Z. Meral Özsoyoglu,et al.  Indexing large metric spaces for similarity search queries , 1999, TODS.

[6]  Tamer Kahveci,et al.  Reference-based indexing of sequence databases , 2006, VLDB.

[7]  BozkayaTolga,et al.  Indexing large metric spaces for similarity search queries , 1999 .

[8]  Yueting Zhuang,et al.  Indexing high-dimensional data in dual distance spaces: a symmetrical encoding approach , 2008, EDBT '08.

[9]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[10]  Hans-Peter Kriegel,et al.  Multi-step density-based clustering , 2005, Knowledge and Information Systems.

[11]  Beng Chin Ooi,et al.  iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.

[12]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[13]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[14]  Nieves R. Brisaboa,et al.  Spatial Selection of Sparse Pivots for Similarity Search in Metric Spaces , 2007, SOFSEM.

[15]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.