Regularized and sparse stochastic k-means for distributed large-scale clustering

In this paper we present a novel clustering approach based on the stochastic learning paradigm and regularization with l1-norms. Our approach is an extension of the widely acknowledged K-Means algorithm. We introduce a simple regularized dual averaging scheme for learning prototype vectors (centroids) with l1-norms in a stochastic mode. In our approach we distribute the learning of individual prototype vectors for each cluster, and the re-assignment of cluster memberships is performed only for a fixed number of outer iterations. The latter approach is exactly the same as in original K-Means algorithm and aims at re-shuffling the pool of samples per cluster according to the learned centroids. We report an extended evaluation and comparison of our approach with respect to various clustering techniques like randomized K-Means and Proximal Plane Clustering. Our experimental studies indicate the usefulness of the proposed methods for obtaining better prototype vectors and corresponding cluster memberships while being able to perform feature selection by l1-norm minimization.

[1]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[2]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[3]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[4]  Attila Gürsoy,et al.  Data Decomposition for Parallel K-means Clustering , 2003, Parallel Processing and Applied Mathematics.

[5]  Divyakant Agrawal,et al.  Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[6]  Robert Tibshirani,et al.  A Framework for Feature Selection in Clustering , 2010, Journal of the American Statistical Association.

[7]  Wei Sun,et al.  Regularized k-means clustering of high-dimensional data and its asymptotic consistency , 2012 .

[8]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[9]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[10]  Pasi Fränti,et al.  Iterative shrinking method for clustering problems , 2006, Pattern Recognit..

[11]  Lee Sael,et al.  Procedia Computer Science , 2015 .

[12]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13]  Samir Saoudi,et al.  Stochastic K-means algorithm for vector quantization , 2001, Pattern Recognit. Lett..

[14]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[15]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[16]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[17]  F. Bach,et al.  Optimization with Sparsity-Inducing Penalties (Foundations and Trends(R) in Machine Learning) , 2011 .

[18]  Attila Gursoy Data Decomposition for Parallel K-means Clustering , 2003 .

[19]  Lan Bai,et al.  Proximal Plane Clustering via Eigenvalues , 2013, ITQM.

[20]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[21]  Sean Owen,et al.  Mahout in Action , 2011 .

[22]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[23]  Johan A. K. Suykens,et al.  Reweighted l 2-Regularized Dual Averaging Approach for Highly Sparse Stochastic Learning , 2014, ISNN.