Approximate k-Nearest Neighbor Search Based on the Earth Mover's Distance for Efficient Content-based Information Retrieval

The Earth Mover's Distance (EMD) is one of the most-widely used distance functions to measure the similarity between two multimedia objects. While providing good search results, the EMD is too much time-consuming to be used in large multimedia databases. To solve the problem, we propose an approximate k-nearest neighbor (k-NN) search method based on the EMD. First, the proposed method builds an index using the M-tree, a distance-based multi-dimensional index structure, to reduce the disk access overhead. When building the index, we reduce the number of features in the multimedia objects through dimensionality-reduction. When performing the k-NN search on the M-tree, we find a small set of candidates from the disk using the index and then perform the post-processing on them. Second, the proposed method uses the approximate EMD for index retrieval and post-processing to reduce the computational overhead of the EMD. To compensate the errors due to the approximation, the method provides a way of accuracy improvement of the approximate EMD. We performed extensive experiments to show the efficiency of the proposed method.

[1]  Ira Assent,et al.  Efficient EMD-based similarity search in multimedia databases via flexible dimensionality reduction , 2008, SIGMOD Conference.

[2]  Reynold Cheng,et al.  Earth Mover's Distance based Similarity Search at Scale , 2013, Proc. VLDB Endow..

[3]  Reda Alhajj,et al.  Integrating wavelets with clustering and indexing for effective content-based image retrieval , 2012, Knowl. Based Syst..

[4]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.

[5]  Ambuj K. Singh,et al.  Indexing Spatially Sensitive Distance Measures Using Multi-resolution Lower Bounds , 2006, EDBT.

[6]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[7]  Gang Wang,et al.  A novel bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor method , 2011, Knowl. Based Syst..

[8]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[9]  Christos Faloutsos,et al.  A linear-time approximation of the earth mover's distance , 2011, CIKM '11.

[10]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[11]  Hossein Nezamabadi-pour,et al.  A simultaneous feature adaptation and feature selection method for content-based image retrieval systems , 2013, Knowl. Based Syst..

[12]  Anthony K. H. Tung,et al.  Efficient and effective similarity search over probabilistic data based on Earth Mover’s Distance , 2010, The VLDB Journal.

[13]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  David W. Jacobs,et al.  Approximate earth mover’s distance in linear time , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Ira Assent,et al.  Approximation Techniques for Indexing the Earth Mover’s Distance in Multimedia Databases , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[16]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[17]  Christos Faloutsos,et al.  Analysis of the Clustering Properties of the Hilbert Space-Filling Curve , 2001, IEEE Trans. Knowl. Data Eng..

[18]  Min-Hee Jang,et al.  On Extracting Perception-Based Features for Effective Similar Shader Retreival , 2011, 2011 IEEE 35th Annual Computer Software and Applications Conference.

[19]  Ambuj K. Singh,et al.  Indexing the Earth Mover's Distance Using Normal Distributions , 2011, Proc. VLDB Endow..

[20]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[21]  Tobias Meisen,et al.  Efficient similarity search using the Earth Mover's Distance for large multimedia databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[22]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..