Distributed Kernel Matrix Approximation and Implementation Using Message Passing Interface

We propose a distributed method to compute similarity (also known as kernel and Gram) matrices used in various kernel-based machine learning algorithms. Current methods for computing similarity matrices have quadratic time and space complexities, which make them not scalable to large-scale data sets. To reduce these quadratic complexities, the proposed method first partitions the data into smaller subsets using various families of locality sensitive hashing, including random project and spectral hashing. Then, the method computes the similarity values among points in the smaller subsets to result in approximated similarity matrices. We analytically show that the time and space complexities of the proposed method are sub quadratic. We implemented the proposed method using the Message Passing Interface (MPI) framework and ran it on a cluster. Our results with real large-scale data sets show that the proposed method does not significantly impact the accuracy of the computed similarity matrices and it achieves substantial savings in running time and memory requirements.

[1]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[5]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[6]  GuptaAnupam,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003 .

[7]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[8]  John Platt,et al.  FastMap, MetricMap, and Landmark MDS are all Nystrom Algorithms , 2005, AISTATS.

[9]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[10]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[11]  Gene H. Golub,et al.  Matrix computations , 1983 .

[12]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[13]  Peter Frankl,et al.  The Johnson-Lindenstrauss lemma and the sphericity of some graphs , 1987, J. Comb. Theory B.

[14]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[15]  Wael Abd-Almageed,et al.  Efficient band approximation of Gram matrices for large scale kernel methods on GPUs , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[16]  G. W. Stewart,et al.  On the Early History of the Singular Value Decomposition , 1993, SIAM Rev..

[17]  Gunnar Rätsch,et al.  Engineering Support Vector Machine Kerneis That Recognize Translation Initialion Sites , 2000, German Conference on Bioinformatics.

[18]  D. F. Andrew Harmonic management for large, multimotor DC drive systems , 1993 .

[19]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[20]  Edward A. Fox,et al.  Recent Developments in Document Clustering , 2007 .

[21]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[22]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..