PCAF: Scalable, High Precision k-NN Search Using Principal Component Analysis Based Filtering

Approximate k Nearest Neighbours (AkNN) search is widely used in domains such as computer vision and machine learning. However, AkNN search in high dimensional datasets does not work well on multicore platforms. It scales poorly due to its large memory footprint. Current parallel AkNN search using space subdivision for filtering helps reduce the memory footprint, but leads to loss of precision. We propose a new data filtering method -- PCAF -- for parallel AkNN search based on principal components analysis. PCAF improves on previous methods by demonstrating sustained, high scalability for a wide range of high dimensional datasets on both Intel and AMD multicore platforms. Moreover, PCAF maintains high precision in terms of the AkNN search results.

[1]  Eamonn J. Keogh Nearest Neighbor , 2010, Encyclopedia of Machine Learning.

[2]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Brendan McCane,et al.  Better than SIFT? , 2015, Machine Vision and Applications.

[5]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[6]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[7]  P. J. Narayanan,et al.  Singular value decomposition on GPU using CUDA , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[8]  Bruce A. Draper,et al.  Are you using the right approximate nearest neighbor algorithm? , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[9]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[10]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[11]  Davide Anguita,et al.  Human Activity Recognition on Smartphones Using a Multiclass Hardware-Friendly Support Vector Machine , 2012, IWAAL.

[12]  Panos Kalnis,et al.  Quality and efficiency in high dimensional nearest neighbor search , 2009, SIGMOD Conference.

[13]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[14]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[15]  Byron L. D. Bezerra,et al.  A KNN-SVM hybrid model for cursive handwriting recognition , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[16]  Sameer A. Nene,et al.  A simple algorithm for nearest neighbor search in high dimensions , 1997 .

[17]  Lawrence Cayton,et al.  Accelerating Nearest Neighbor Search on Manycore Systems , 2011, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[18]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[19]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[20]  Jian Zhang,et al.  Implmentation of a covariance-based principal component analysis algorithm for hyperspectral imaging applications with multi-threading in both CPU and GPU , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[21]  Y. Wang,et al.  Large-scale paralleled sparse principal component analysis , 2014, Multimedia Tools and Applications.

[22]  Robert A. van de Geijn,et al.  A Family of High-Performance Matrix Multiplication Algorithms , 2001, International Conference on Computational Science.

[23]  Jiri Matas,et al.  Improving Descriptors for Fast Tree Matching by Optimal Linear Projection , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[24]  David M. Eyers,et al.  Optimal space subdivision for parallel approximate nearest neighbour determination , 2015, 2015 International Conference on Image and Vision Computing New Zealand (IVCNZ).

[25]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[26]  Andrew W. Moore,et al.  An Investigation of Practical Approximate Nearest Neighbor Algorithms , 2004, NIPS.

[27]  Minyi Guo,et al.  Scalable Multicore k-NN Search via Subspace Clustering for Filtering , 2015, IEEE Transactions on Parallel and Distributed Systems.

[28]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[29]  Guojing Cong,et al.  Optimizing Large-scale Graph Analysis on Multithreaded, Multicore Platforms , 2011, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[30]  Mohammad Al Hasan,et al.  SONNET: Efficient Approximate Nearest Neighbor Using Multi-core , 2010, 2010 IEEE International Conference on Data Mining.

[31]  Minyi Guo,et al.  Data filtering for scalable high-dimensional k-NN search on multicore systems , 2014, HPDC '14.

[32]  JegouHerve,et al.  Product Quantization for Nearest Neighbor Search , 2011 .

[33]  Anthony Whitehead,et al.  A PCA-Based Binning Approach for Matching to Large SIFT Database , 2010, 2010 Canadian Conference on Computer and Robot Vision.

[34]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[35]  SchmidCordelia,et al.  A Performance Evaluation of Local Descriptors , 2005 .

[36]  Jeremy Buhler,et al.  Provably sensitive Indexing strategies for biosequence similarity search , 2002, RECOMB '02.

[37]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[38]  Anthony Whitehead,et al.  Efficient SIFT matching from keypoint descriptor properties , 2009, 2009 Workshop on Applications of Computer Vision (WACV).

[39]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[41]  Gang Hua,et al.  Discriminant Embedding for Local Image Descriptors , 2007, 2007 IEEE 11th International Conference on Computer Vision.