Parallelization with Multiplicative Algorithms for Big Data Mining

We propose a nontrivial strategy to parallelize a series of data mining and machine learning problems, including 1-class and 2-class support vector machines, nonnegative least square problems, and $\ell_1$ regularized regression (LASSO) problems. Our strategy fortunately leads to extremely simple multiplicative algorithms which can be straightforwardly implemented in parallel computational environments, such as Map Reduce, or CUDA. We provide rigorous analysis of the correctness and convergence of the algorithm. We demonstrate the scalability and accuracy of our algorithms in comparison with other current leading algorithms.

[1]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[2]  Feiping Nie,et al.  Consensus spectral clustering in near-linear time , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[3]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[4]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[6]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[7]  L. N. Vicente,et al.  A comparison of block pivoting and interior-point algorithms for linear least squares problems with nonnegative variables , 1994 .

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[10]  Chris H. Q. Ding,et al.  Nonnegative Matrix Factorization for Combinatorial Optimization: Spectral Clustering, Graph Matching, and Clique Finding , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[11]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[12]  Daniel D. Lee,et al.  Multiplicative Updates for Nonnegative Quadratic Programming in Support Vector Machines , 2002, NIPS.

[13]  Chiranjib Bhattacharyya,et al.  Maximum Margin Classifiers with Specified False Positive and False Negative Error Rates , 2007, SDM.

[14]  Chris H. Q. Ding,et al.  Biclustering Protein Complex Interactions with a Biclique Finding Algorithm , 2006, Sixth International Conference on Data Mining (ICDM'06).

[15]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[16]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[17]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[18]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[19]  R. Bro,et al.  A fast non‐negativity‐constrained least squares algorithm , 1997 .

[20]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[21]  J. Lieb,et al.  ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. , 2004, Genomics.

[22]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[23]  Rahul Gupta,et al.  Accurate max-margin training for structured output spaces , 2008, ICML '08.

[24]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[25]  Yuchun Guo,et al.  Discovering homotypic binding events at high spatial resolution , 2010, Bioinform..

[26]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[27]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[28]  Huan Liu Feature Selection , 2010, Encyclopedia of Machine Learning.

[29]  M. V. Van Benthem,et al.  Fast algorithm for the solution of large‐scale non‐negativity‐constrained least squares problems , 2004 .