Nonsmooth Penalized Clustering via $\ell _{p}$ Regularized Sparse Regression

Clustering has been widely used in data analysis. A majority of existing clustering approaches assume that the number of clusters is given in advance. Recently, a novel clustering framework is proposed which can automatically learn the number of clusters from training data. Based on these works, we propose a nonsmooth penalized clustering model via <inline-formula> <tex-math notation="LaTeX">$\ell _{p}$ </tex-math></inline-formula>(<inline-formula> <tex-math notation="LaTeX">$0<p<1$ </tex-math></inline-formula>) regularized sparse regression. In particular, this model is formulated as a nonsmooth nonconvex optimization, which is based on over-parameterization and utilizes an <inline-formula> <tex-math notation="LaTeX">$\ell _{p}$ </tex-math></inline-formula>-norm-based regularization to control the tradeoff between the model fit and the number of clusters. We theoretically prove that the new model can guarantee the sparseness of cluster centers. To increase its practicality for practical use, we adhere to an easy-to-compute criterion and follow a strategy to narrow down the search interval of cross validation. To address the nonsmoothness and nonconvexness of the cost function, we propose a simple smoothing trust region algorithm and present its convergent and computational complexity analysis. Numerical studies on both simulated and practical data sets provide support to our theoretical results and demonstrate the advantages of our new method.

[1]  Stephen P. Boyd,et al.  Disciplined Convex Programming , 2006 .

[2]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[3]  R. Fletcher Practical Methods of Optimization , 1988 .

[4]  Xiaojun Chen,et al.  Smoothing methods for nonsmooth, nonconvex minimization , 2012, Math. Program..

[5]  Francisco Herrera,et al.  A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms , 2011, Swarm Evol. Comput..

[6]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[7]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[8]  Rob Fergus,et al.  Fast Image Deconvolution using Hyper-Laplacian Priors , 2009, NIPS.

[9]  Eric C. Chi,et al.  Splitting Methods for Convex Clustering , 2013, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[10]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[11]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[12]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[13]  Xiaojun Chen,et al.  Optimality Conditions and a Smoothing Trust Region Newton Method for NonLipschitz Optimization , 2013, SIAM J. Optim..

[14]  Faïez Gargouri,et al.  Group extraction from professional social network using a new semi-supervised hierarchical clustering , 2013, Knowledge and Information Systems.

[15]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Xiaojun Chen,et al.  Smoothing Nonlinear Conjugate Gradient Method for Image Restoration Using Nonsmooth Nonconvex Minimization , 2010, SIAM J. Imaging Sci..

[17]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[18]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[19]  Francis R. Bach,et al.  Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties , 2011, ICML.

[20]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[21]  Yurii Nesterov,et al.  Smoothing Technique and its Applications in Semidefinite Optimization , 2004, Math. Program..

[22]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[23]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Qing Zhou,et al.  Solution path clustering with adaptive concave penalty , 2014, 1404.6289.

[25]  Y. She,et al.  Thresholding-based iterative selection procedures for model selection and shrinkage , 2008, 0812.5061.

[26]  L. N. Vicente,et al.  Smoothing and worst-case complexity for direct-search methods in nonsmooth optimization , 2013 .

[27]  R. Chartrand,et al.  Restricted isometry properties and nonconvex compressive sensing , 2007 .

[28]  Wei Pan,et al.  Cluster analysis: unsupervised learning via supervised learning with a non-convex penalty , 2013, J. Mach. Learn. Res..

[29]  Plamen P. Angelov,et al.  DEC: Dynamically Evolving Clustering and Its Application to Structure Identification of Evolving Fuzzy Models , 2014, IEEE Transactions on Cybernetics.

[30]  Xiaojun Chen,et al.  Smoothing Neural Network for Constrained Non-Lipschitz Optimization With Applications , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Tom Goldstein,et al.  The Split Bregman Method for L1-Regularized Problems , 2009, SIAM J. Imaging Sci..

[32]  Cranos M. Williams,et al.  Hierarchical Modularization Of Biochemical Pathways Using Fuzzy-C Means Clustering , 2014, IEEE Transactions on Cybernetics.

[33]  Bin Wang,et al.  A Fast and Robust Level Set Method for Image Segmentation Using Fuzzy Clustering and Lattice Boltzmann Method , 2013, IEEE Transactions on Cybernetics.

[34]  James P. Reilly,et al.  Minimizing Nonconvex Functions for Sparse Vector Reconstruction , 2010, IEEE Transactions on Signal Processing.

[35]  Rick Chartrand,et al.  Exact Reconstruction of Sparse Signals via Nonconvex Minimization , 2007, IEEE Signal Processing Letters.

[36]  Jorge J. Moré,et al.  Benchmarking optimization software with performance profiles , 2001, Math. Program..

[37]  Zongben Xu,et al.  $L_{1/2}$ Regularization: A Thresholding Representation Theory and a Fast Solver , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Bhaskar D. Rao,et al.  An affine scaling methodology for best basis selection , 1999, IEEE Trans. Signal Process..

[39]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[40]  J. Suykens,et al.  Convex Clustering Shrinkage , 2005 .

[41]  L. Ljung,et al.  Clustering using sum-of-norms regularization: With application to particle filter output computation , 2011, 2011 IEEE Statistical Signal Processing Workshop (SSP).

[42]  S. Foucart,et al.  Sparsest solutions of underdetermined linear systems via ℓq-minimization for 0 , 2009 .

[43]  F. Clarke Optimization And Nonsmooth Analysis , 1983 .

[44]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[45]  Yang Yang,et al.  Multitask Spectral Clustering by Exploring Intertask Correlation , 2015, IEEE Transactions on Cybernetics.

[46]  Francisco Herrera,et al.  Analyzing convergence performance of evolutionary algorithms: A statistical approach , 2014, Inf. Sci..

[47]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[48]  Xiaotong Shen,et al.  Journal of the American Statistical Association Likelihood-based Selection and Sharp Parameter Estimation Likelihood-based Selection and Sharp Parameter Estimation , 2022 .

[49]  Michael Elad,et al.  From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images , 2009, SIAM Rev..

[50]  Rayan Saab,et al.  Stable sparse approximations via nonconvex optimization , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[51]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[52]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[53]  Jieping Ye,et al.  A General Iterative Shrinkage and Thresholding Algorithm for Non-convex Regularized Optimization Problems , 2013, ICML.

[54]  Xiaojun Chen,et al.  Non-Lipschitz $\ell_{p}$-Regularization and Box Constrained Model for Image Restoration , 2012, IEEE Transactions on Image Processing.