Similarity Search Problem Research on Multi-dimensional Data Sets

In this paper, we present our continuous work on designing an algorithm to find nearest neighbors to given queries. In our previous work, we analyze the situation that there are multiple queries with different level of importance, and define a weight for each query point. We also propose an algorithm to find nearest neighbors to multiple queries with weights and enhanced our algorithm based on query point distribution. In this paper we analyze the data distribution on various dimensions, and apply the shrinking concept for the improvement and enhancement of our multi-query search approach.

[1]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[2]  Yong Shi,et al.  A Similarity Search Approach to Solving the Multi-query Problems , 2012, 2012 IEEE/ACIS 11th International Conference on Computer and Information Science.

[3]  Charu C. Aggarwal,et al.  Towards meaningful high-dimensional nearest neighbor search by human-computer interaction , 2002, Proceedings 18th International Conference on Data Engineering.

[4]  Elke Achtert,et al.  Efficient reverse k-nearest neighbor search in arbitrary metric spaces , 2006, SIGMOD Conference.

[5]  Heng Tao Shen,et al.  Exploring Bit-Difference for Approximate KNN Search in High-dimensional Databases , 2005, ADC.

[6]  Thomas C. Redman,et al.  Data Quality Management and Technology , 1992 .

[7]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[8]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[9]  Betty Salzberg,et al.  Bulletin of the Technical Committee on Data Engineering , 1995 .

[10]  Peter J. Haas,et al.  The New Jersey Data Reduction Report , 1997 .

[11]  Ronald Fagin,et al.  Efficient similarity search and classification via rank aggregation , 2003, SIGMOD '03.

[12]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[13]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.

[14]  Philip S. Yu,et al.  The IGrid index: reversing the dimensionality curse for similarity indexing in high dimensional space , 2000, KDD '00.

[15]  Yong Shi,et al.  A shrinking-based clustering approach for multidimensional data , 2005, IEEE Transactions on Knowledge and Data Engineering.

[16]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[17]  Anthony K. H. Tung,et al.  Similarity search: a matching based approach , 2006, VLDB.

[18]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[19]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[20]  Christian Böhm,et al.  Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).