Simultaneous Clustering and Optimization for Evolving Datasets

Simultaneous clustering and optimization (SCO) has recently drawn much attention due to its wide range of practical applications. Many methods have been previously proposed to solve this problem and obtain the optimal model. However, when a dataset evolves over time, those existing methods have to update the model frequently to guarantee accuracy; such updating is computationally infeasible. In this paper, we propose a new formulation of SCO to handle evolving datasets. Specifically, we propose a new variant of the alternating direction method of multipliers (ADMM) to solve this problem efficiently. The guarantee of model accuracy is analyzed theoretically for two specific tasks: ridge regression and convex clustering. Extensive empirical studies confirm the effectiveness of our method.

[1]  Bingsheng He,et al.  On non-ergodic convergence rate of Douglas–Rachford alternating direction method of multipliers , 2014, Numerische Mathematik.

[2]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[3]  Eter,et al.  Convex clustering via `1 fusion penalization , 2016 .

[4]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[5]  Eric C. Chi,et al.  Splitting Methods for Convex Clustering , 2013, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[6]  Shuicheng Yan,et al.  Convex Optimization Procedure for Clustering: Theoretical Revisit , 2014, NIPS.

[7]  Stephen P. Boyd,et al.  Network Lasso: Clustering and Optimization in Large Graphs , 2015, KDD.

[8]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[9]  Dimitri P. Bertsekas,et al.  Convex Analysis and Optimization , 2003 .

[10]  Gary K. Chen,et al.  Convex Clustering: An Attractive Alternative to Hierarchical Clustering , 2014, PLoS Comput. Biol..

[11]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..

[12]  En Zhu,et al.  Large-scale k-means clustering via variance reduction , 2018, Neurocomputing.

[13]  Akshi Kumar,et al.  Machine Learning from Theory to Algorithms: An Overview , 2018, Journal of Physics: Conference Series.

[14]  Xinwang Liu,et al.  Learning a Joint Affinity Graph for Multiview Subspace Clustering , 2019, IEEE Transactions on Multimedia.

[15]  L. Ljung,et al.  Just Relax and Come Clustering! : A Convexification of k-Means Clustering , 2011 .

[16]  Alexander Jung,et al.  When Is Network Lasso Accurate? , 2017, Front. Appl. Math. Stat..

[17]  Devdatt P. Dubhashi,et al.  Clustering by Sum of Norms: Stochastic Incremental Algorithm, Convergence and Cluster Recovery , 2017, ICML.

[18]  Kim-Chuan Toh,et al.  An Efficient Semismooth Newton Based Algorithm for Convex Clustering , 2018, ICML.

[19]  Dinggang Shen,et al.  Late Fusion Incomplete Multi-View Clustering , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  En Zhu,et al.  Triangle Lasso for Simultaneous Clustering and Optimization in Graph Datasets , 2019, IEEE Transactions on Knowledge and Data Engineering.

[21]  Kean Ming Tan,et al.  Statistical properties of convex clustering. , 2015, Electronic journal of statistics.

[22]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[23]  David De Roure,et al.  An Application of Network Lasso Optimization For Ride Sharing Prediction , 2016 .