Efficient KNN search by linear projection of image clusters

K‐nearest neighbors (KNN) search in a high‐dimensional vector space is an important paradigm for a variety of applications. Despite the continuous efforts in the past years, algorithms to find the exact KNN answer set at high dimensions are outperformed by a linear scan method. In this paper, we propose a technique to find the exact KNN image objects to a given query object. First, the proposed technique clusters the images using a self‐organizing map algorithm and then it projects the found clusters into points in a linear space based on the distances between each cluster and a selected reference point. These projected points are then organized in a simple, compact, and yet fast index structure called array‐index. Unlike most indexes that support KNN search, the array‐index requires a storage space that is linear in the number of projected points. The experiments show that the proposed technique is more efficient and robust to dimensionality as compared to other well‐known techniques because of its simplicity and compactness. © 2011 Wiley Periodicals, Inc.

[1]  Kehong Yuan,et al.  Brain CT image database building for computer-aided diagnosis using content-based image retrieval , 2011, Inf. Process. Manag..

[2]  Xiao Zhang,et al.  Efficient indexing for large scale visual search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Kyuseok Shim,et al.  WALRUS: a similarity retrieval algorithm for image databases , 1999, IEEE Transactions on Knowledge and Data Engineering.

[4]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[5]  Sharad Mehrotra,et al.  Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces , 2000, VLDB.

[6]  David Salesin,et al.  Fast multiresolution image querying , 1995, SIGGRAPH.

[7]  Erkki Oja,et al.  Self-Organizing Maps for Content-Based Image Database Retrieval , 1999 .

[8]  Zaher Al Aghbari,et al.  Fast k-NN Image Search with Self-Organizing Maps , 2002, CIVR.

[9]  Jon Louis Bentley,et al.  Multidimensional Binary Search Trees in Database Applications , 1979, IEEE Transactions on Software Engineering.

[10]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[11]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[12]  Zaher Al Aghbari,et al.  Array-index: a plug&search K nearest neighbors method for high-dimensional data , 2005, Data Knowl. Eng..

[13]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[14]  Christos Faloutsos,et al.  Fast and Effective Retrieval of Medical Tumor Shapes , 1998, IEEE Trans. Knowl. Data Eng..

[15]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[16]  Nimrod Megiddo,et al.  Fast indexing method for multidimensional nearest-neighbor search , 1998, Electronic Imaging.

[17]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[18]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[19]  Xiaoming Zhu,et al.  An efficient indexing method for nearest neighbor searches in high-dirnensional image databases , 2002, IEEE Trans. Multim..

[20]  Hans-Werner Six,et al.  The LSD tree: Spatial Access to Multidimensional Point and Nonpoint Objects , 1989, VLDB.

[21]  Johan Himberg,et al.  A SOM based cluster visualization and its application for false coloring , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[22]  Hans-Peter Kriegel,et al.  Fast nearest neighbor search in high-dimensional space , 1998, Proceedings 14th International Conference on Data Engineering.

[23]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[24]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[25]  Elias Pampalk,et al.  EMPIRICAL EVALUATION OF CLUSTERING ALGORITHMS , 2000 .

[26]  Christian Böhm,et al.  A cost model for nearest neighbor search in high-dimensional data space , 1997, PODS.

[27]  Christos Faloutsos,et al.  Searching Multimedia Databases by Content , 1996, Advances in Database Systems.

[28]  Christos Faloutsos,et al.  Fractals for secondary key retrieval , 1989, PODS.

[29]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[30]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[31]  Yufei Tao,et al.  The Bdual-Tree: indexing moving objects by space filling curves in the dual space , 2008, The VLDB Journal.

[32]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[33]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[34]  L. Rodney Long,et al.  A system for searching uterine cervix images by visual attributes , 2009, 2009 22nd IEEE International Symposium on Computer-Based Medical Systems.

[35]  Jack A. Orenstein Spatial query processing in an object-oriented database system , 1986, SIGMOD '86.