Unsupervised learning of regression mixture models with unknown number of components

ABSTRACT We propose a new unsupervised learning algorithm to fit regression mixture models with unknown number of components. The developed approach consists in a penalized maximum likelihood estimation carried out by a robust expectation–maximization (EM)-like algorithm. We derive it for polynomial, spline, and B-spline regression mixtures. The proposed learning approach is unsupervised: (i) it simultaneously infers the model parameters and the optimal number of the regression mixture components from the data as the learning proceeds, rather than in a two-fold scheme as in standard model-based clustering using afterward model selection criteria, and (ii) it does not require accurate initialization unlike the standard EM for regression mixtures. The developed approach is applied to curve clustering problems. Numerical experiments on simulated and real data show that the proposed algorithm performs well and provides accurate clustering results, and confirm its benefit for practical applications.

[1]  Geoffrey J. McLachlan,et al.  FITTING FINITE MIXTURE MODELS IN A REGRESSION CONTEXT , 1992 .

[2]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[3]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[4]  J. B. Ramsey,et al.  Estimating Mixtures of Normal Distributions and Switching Regressions , 1978 .

[5]  Faicel Chamroukhi,et al.  Robust EM algorithm for model-based curve clustering , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[6]  H. Akaike A new look at the statistical model identification , 1974 .

[7]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[8]  D. S. Young,et al.  Mixtures of regressions with predictor-dependent mixing proportions , 2010, Comput. Stat. Data Anal..

[9]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[10]  D. Hunter,et al.  Semiparametric mixtures of regressions , 2012 .

[11]  G. McLachlan On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture , 1987 .

[12]  R. D. Veaux,et al.  Mixtures of linear regressions , 1989 .

[13]  Frédéric Ferraty,et al.  Nonparametric Functional Data Analysis: Theory and Practice (Springer Series in Statistics) , 2006 .

[14]  Padhraic Smyth,et al.  Probabilistic curve-aligned clustering and prediction with regression mixture models , 2004 .

[15]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .

[16]  J. O. Ramsay,et al.  Functional Data Analysis (Springer Series in Statistics) , 1997 .

[17]  J. Franklin,et al.  The elements of statistical learning: data mining, inference and prediction , 2005 .

[18]  Sophie Dabo-Niang,et al.  On the using of modal curves for radar waveforms classification , 2007, Comput. Stat. Data Anal..

[19]  C. J. Stone,et al.  A study of logspline density estimation , 1991 .

[20]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[21]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[22]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[23]  Aurore Delaigle,et al.  Componentwise classification and clustering of functional data , 2012 .

[24]  Allou Samé,et al.  Model-based clustering and segmentation of time series with changes in regime , 2011, Adv. Data Anal. Classif..

[25]  Miin-Shen Yang,et al.  A robust EM clustering algorithm for Gaussian mixture models , 2012, Pattern Recognit..

[26]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[27]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[28]  Padhraic Smyth,et al.  Trajectory clustering with mixtures of regression models , 1999, KDD '99.

[29]  Yves Lechevallier,et al.  Exploratory analysis of functional data via clustering and optimal segmentation , 2010, Neurocomputing.

[30]  Allou Samé,et al.  Model-based clustering with Hidden Markov Model regression for time series with regime changes , 2011, The 2011 International Joint Conference on Neural Networks.

[31]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[32]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[33]  Hsiao-Dong Chiang,et al.  TRUST-TECH-Based Expectation Maximization for Learning Finite Mixture Models , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[35]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[36]  Gilda Soromenho,et al.  Fitting mixtures of linear regressions , 2010 .

[37]  W. DeSarbo,et al.  A maximum likelihood methodology for clusterwise linear regression , 1988 .

[38]  Kert Viele,et al.  Modeling with Mixtures of Linear Regressions , 2002, Stat. Comput..

[39]  B. Ripley,et al.  Semiparametric Regression: Preface , 2003 .

[40]  Frédéric Ferraty,et al.  Curves discrimination: a nonparametric functional approach , 2003, Comput. Stat. Data Anal..

[41]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[42]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[43]  Faicel Chamroukhi,et al.  Hidden process regression for curve modeling, classification and tracking , 2010 .

[44]  Allou Samé,et al.  Time series modeling by a regression approach based on a latent process , 2009, Neural Networks.

[45]  R. Quandt A New Approach to Estimating Switching Regressions , 1972 .

[46]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[47]  P. Deb Finite Mixture Models , 2008 .

[48]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  Hervé Glotin,et al.  Functional Mixture Discriminant Analysis with hidden process regression for curve classification , 2012, ESANN.

[50]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[51]  Allou Samé,et al.  A hidden process regression model for functional data description. Application to curve discrimination , 2010, Neurocomputing.

[52]  Christophe Biernacki,et al.  Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models , 2003, Comput. Stat. Data Anal..

[53]  T. Turner,et al.  Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions , 2000 .

[54]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[55]  C. R. Deboor,et al.  A practical guide to splines , 1978 .

[56]  Adrian E. Raftery,et al.  Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering , 2007, J. Classif..

[57]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .

[58]  G. McLachlan,et al.  The EM Algorithm and Extensions: Second Edition , 2008 .

[59]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[60]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .