Convex covariate clustering for classification

Clustering, like covariate selection for classification, is an important step to compress and interpret the data. However, clustering of covariates is often performed independently of the classification step, which can lead to undesirable clustering results that harm interpretability and compression rate. Therefore, we propose a method that can cluster covariates while taking into account class label information of samples. We formulate the problem as a convex optimization problem which uses both, a-priori similarity information between covariates, and information from class-labeled samples. Like ordinary convex clustering [Chi and Lange, 2015], the proposed method offers a unique global minima making it insensitive to initialization. In order to solve the convex problem, we propose a specialized alternating direction method of multipliers (ADMM), which scales up to several thousands of variables. Furthermore, in order to circumvent computationally expensive cross-validation, we propose a model selection criterion based on approximating the marginal likelihood. Experiments on synthetic and real data confirm the usefulness of the proposed clustering method and the selection criterion.

[1]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[2]  Xuelong Li,et al.  Efficient Clustering Based On A Unified View Of $K$-means And Ratio-cut , 2020, NeurIPS.

[3]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[5]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[6]  Tomohiro Ando,et al.  Bayesian Model Selection and Statistical Modeling , 2010 .

[7]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[8]  Eric C. Chi,et al.  Splitting Methods for Convex Clustering , 2013, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[9]  Edwin R. Hancock,et al.  High-order covariate interacted Lasso for feature selection , 2017, Pattern Recognit. Lett..

[10]  Y. She Sparse regression with exact clustering , 2008 .

[11]  Stephen P. Boyd,et al.  Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding , 2013, Journal of Optimization Theory and Applications.

[12]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13]  Nipun Kwatra,et al.  Unsupervised Clustering using Pseudo-semi-supervised Learning , 2020, ICLR.

[14]  Éric Gaussier,et al.  Deep k-Means: Jointly Clustering with k-Means and Learning Representations , 2018, Pattern Recognit. Lett..

[15]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[16]  Nicolai Meinshausen,et al.  Relaxed Lasso , 2007, Comput. Stat. Data Anal..

[17]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[18]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[19]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[20]  Stephen P. Boyd,et al.  Network Lasso: Clustering and Optimization in Large Graphs , 2015, KDD.

[21]  Stephen P. Boyd,et al.  ECOS: An SOCP solver for embedded systems , 2013, 2013 European Control Conference (ECC).

[22]  P. Tseng Applications of splitting algorithm to decomposition in convex programming and variational inequalities , 1991 .

[23]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[24]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[25]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[26]  Niels Richard Hansen,et al.  Sparse group lasso and high dimensional multinomial classification , 2012, Comput. Stat. Data Anal..

[27]  Stephen Boyd,et al.  A Rewriting System for Convex Optimization Problems , 2017, ArXiv.

[28]  Francis R. Bach,et al.  Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties , 2011, ICML.