Nonsmooth nonconvex optimization approach to clusterwise linear regression problems

Clusterwise regression consists of finding a number of regression functions each approximating a subset of the data. In this paper, a new approach for solving the clusterwise linear regression problems is proposed based on a nonsmooth nonconvex formulation. We present an algorithm for minimizing this nonsmooth nonconvex function. This algorithm incrementally divides the whole data set into groups which can be easily approximated by one linear regression function. A special procedure is introduced to generate a good starting point for solving global optimization problems at each iteration of the incremental algorithm. Such an approach allows one to find global or near global solution to the problem when the data sets are sufficiently dense. The algorithm is compared with the multistart Spath algorithm on several publicly available data sets for regression analysis.

[1]  Helmuth Späth,et al.  A fast algorithm for clusterwise linear regression , 1982, Computing.

[2]  Wayne S. DeSarbo,et al.  A simulated annealing methodology for clusterwise linear regression , 1989 .

[3]  W. DeSarbo,et al.  A maximum likelihood methodology for clusterwise linear regression , 1988 .

[4]  Michael J. Brusco,et al.  Amalgamation of partitions from multiple segmentation bases: A comparison of non-model-based and model-based methods , 2010, Eur. J. Oper. Res..

[5]  Yuehua Wu,et al.  A Procedure for Estimating the Number of Clusters in Logistic Regression Clustering , 2009, J. Classif..

[6]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[7]  Helmuth Späth,et al.  Algorithm 39 Clusterwise linear regression , 1979, Computing.

[8]  David Lindley,et al.  Introduction to the Practice of Statistics , 1990, The Mathematical Gazette.

[9]  Pierre Hansen,et al.  Globally optimal clusterwise regression by mixed logical-quadratic programming , 2011, Eur. J. Oper. Res..

[10]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[11]  Ronald L. Rardin,et al.  A finite steepest-ascent algorithm for maximizing piecewise-linear concave functions , 1978 .

[12]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[13]  Padhraic Smyth,et al.  Trajectory clustering with mixtures of regression models , 1999, KDD '99.

[14]  Gilbert Saporta,et al.  Clusterwise PLS regression on a stochastic process , 2002, Comput. Stat. Data Anal..

[15]  H. J. Arnold Introduction to the Practice of Statistics , 1990 .

[16]  Adil M. Bagirov,et al.  Modified global k-means algorithm for minimum sum-of-squares clustering problems , 2008, Pattern Recognit..

[17]  A. Rubinov,et al.  Unsupervised and supervised data classification via nonsmooth and global optimization , 2003 .

[18]  Yuehua Wu,et al.  A consistent procedure for determining the number of clusters in regression clustering , 2005 .

[19]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[20]  Max A. Little,et al.  Accurate Telemonitoring of Parkinson's Disease Progression by Noninvasive Speech Tests , 2009, IEEE Transactions on Biomedical Engineering.

[21]  Pui Lam Leung,et al.  A mathematical programming approach to clusterwise regression model and its extensions , 1999, Eur. J. Oper. Res..

[22]  M. Wedel,et al.  Consumer benefit segmentation using clusterwise linear regression , 1989 .

[23]  Pierre Hansen,et al.  Variable Neighborhood Search for Least Squares Clusterwise Regression , 2005 .

[24]  P. Cortez,et al.  A data mining approach to predict forest fires using meteorological data , 2007 .

[25]  Bin Zhang Regression clustering , 2003, Third IEEE International Conference on Data Mining.

[26]  Thomas Reutterer,et al.  A combined approach for segment-specific market basket analysis , 2008, Eur. J. Oper. Res..

[27]  I-Cheng Yeh,et al.  Modeling slump flow of concrete using second-order regressions and artificial neural networks , 2007 .

[28]  Adil M. Bagirov,et al.  Fast modified global k-means algorithm for incremental cluster construction , 2011, Pattern Recognit..

[29]  Jeffrey A. Witmer,et al.  DASL-The Data and Story Library , 1997 .

[30]  Adil M. Bagirov,et al.  A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems , 2006, Eur. J. Oper. Res..

[31]  Luis Angel García-Escudero,et al.  Computational Statistics and Data Analysis Robust Clusterwise Linear Regression through Trimming , 2022 .

[32]  I-Cheng Yeh,et al.  Modeling of strength of high-performance concrete using artificial neural networks , 1998 .