论文信息 - Communication Efficient Distributed Kernel Principal Component Analysis

Communication Efficient Distributed Kernel Principal Component Analysis

Kernel Principal Component Analysis (KPCA) is a key machine learning algorithm for extracting nonlinear features from data. In the presence of a large volume of high dimensional data collected in a distributed fashion, it becomes very costly to communicate all of this data to a single data center and then perform kernel PCA. Can we perform kernel PCA on the entire dataset in a distributed and communication efficient fashion while maintaining provable and strong guarantees in solution quality? In this paper, we give an affirmative answer to the question by developing a communication efficient algorithm to perform kernel PCA in the distributed setting. The algorithm is a clever combination of subspace embedding and adaptive sampling techniques, and we show that the algorithm can take as input an arbitrary configuration of distributed datasets, and compute a set of global kernel principal components with relative error guarantees independent of the dimension of the feature space or the total number of data points. In particular, computing k principal components with relative error ε over s workers has communication cost Õ(spk/ε+sk2/ε3) words, where p is the average number of nonzero entries in each data point. Furthermore, we experimented the algorithm with large-scale real world datasets and showed that the algorithm produces a high quality kernel PCA solution while using significantly less communication than alternative approaches.

[1] Acknowledgments , 2006, Molecular and Cellular Endocrinology.

[2] Bernhard Schölkopf,et al. Kernel Methods in Computational Biology , 2005 .

[3] Martin J. Wainwright,et al. Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[4] Michael W. Mahoney,et al. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression , 2012, STOC '13.

[5] Le Song,et al. Scalable Kernel Methods via Doubly Stochastic Gradients , 2014, NIPS.

[6] Michael I. Jordan,et al. Predictive low-rank decomposition for kernel methods , 2005, ICML.

[7] David P. Woodruff,et al. Improved Distributed Principal Component Analysis , 2014, NIPS.

[8] Santosh S. Vempala,et al. Principal Component Analysis and Higher Correlations for Distributed Data , 2013, COLT.

[9] Alexander J. Smola,et al. Learning with kernels , 1998 .

[10] David P. Woodruff. Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[11] Le Song,et al. Distributed Kernel Principal Component Analysis , 2015, ArXiv.

[12] S. Muthukrishnan,et al. Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..

[13] Mark E. J. Newman,et al. Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[14] Tamás Sarlós,et al. Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[15] David P. Woodruff,et al. Fast approximation of matrix coherence and statistical leverage , 2011, ICML.

[16] P. Baldi,et al. Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[17] Bernhard Schölkopf,et al. Kernel Principal Component Analysis , 1997, ICANN.

[18] Bernard Chazelle,et al. The Fast Johnson--Lindenstrauss Transform and Approximate Nearest Neighbors , 2009, SIAM J. Comput..

[19] David P. Woodruff,et al. Subspace Embeddings for the Polynomial Kernel , 2014, NIPS.

[20] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[21] Santosh S. Vempala,et al. Adaptive Sampling and Fast Low-Rank Matrix Approximation , 2006, APPROX-RANDOM.

[22] Christos Boutsidis,et al. Communication-optimal Distributed Principal Component Analysis in the Column-partition Model , 2015, ArXiv.

[23] Maria-Florina Balcan,et al. Distributed Learning, Communication Complexity and Privacy , 2012, COLT.

[24] References , 1971 .

[25] Santosh S. Vempala,et al. An algorithmic theory of learning: Robust concepts and random projection , 1999, Machine Learning.

[26] Lawrence K. Saul,et al. Kernel Methods for Deep Learning , 2009, NIPS.

[27] David P. Woodruff,et al. Low rank approximation and regression in input sparsity time , 2012, STOC '13.

[28] Christos Boutsidis,et al. Optimal CUR matrix decompositions , 2014, STOC.

[29] Huy L. Nguyen,et al. OSNAP: Faster Numerical Linear Algebra Algorithms via Sparser Subspace Embeddings , 2012, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[30] Dimitris Achlioptas,et al. Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[31] Ameet Talwalkar,et al. Sampling Methods for the Nyström Method , 2012, J. Mach. Learn. Res..

[32] Inderjit S. Dhillon,et al. Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.