Advances in Mixture Models

The importance of mixture distributions is not only remarked by a number of recent books on mixtures including Lindsay (1995), Böhning (2000), McLachlan and Peel (2000) and Frühwirth-Schnatter (2006) which update previous books by Everitt and Hand (1981), Titterington et al. (1985) and McLachlan and Basford (1988). Also, a diversity of publications on mixtures appeared in this journal since 2003 (which we take here as a milestone with the appearance of the first special issue on mixtures) including Hazan et al. (2003), Benton and Krishnamoorthy (2003), Woodward and Sain (2003), Besbeas and Morgan (2004), Jamshidian (2004), Hürlimann (2004), Bohacek and Rozovskii (2004), Tao et al. (2004), Vaz de Melo Mendes and Lopes (2006), Agresti et al. (2006), Bartolucci and Scaccia (2006), D’Elia and Piccolo (2005), Neerchal and Morel (2005), Klar and Meintanis (2005), Bocci et al. (2006), Hu and Sung (2006), Seidel et al. (2006), Nadarajah (2006), Almhana et al. (2006), Congdon (2006), Priebe et al. (2006), and Li and Zha (2006). In the following we give a brief introduction to the papers contributing novels aspects in this Special Issue. These come from a diversity of areas as different as capture–recapture modelling, likelihood based cluster analysis, semiparametric mixture modelling in microarray data, latent class analysis or integer lifetime data analysis—just to mention a few. Mixture models are frequently used in capture–recapture studies for estimating population size (Chao, 1987; Link, 2003; Böhning and Schön, 2005; Böhning et al., 2005; Böhning and Kuhnert, 2006). In this issue, Mao (2007) highlights a variety of sources of difficulties in statistical inference using mixture models and uses a binomial mixture model as an illustration. Random intercept models for binary data—as useful tools for addressing between-subject heterogeneity—are discussed by Caffo et al. (2007). The nonlinearity of link functions for binary data is blurred in probit models with a normally distributed random intercept because the resulting model implies a probit marginal link as well. Caffo et al. (2007) explore another family of random intercept models where the distribution associated with the marginal and conditional link function as well as the random effect distribution are all of the same family. Formann (2007) extends the latent class approach (as a specific discrete multivariate mixture model) for situations where the discrete outcome variables (such as longitudinal binary data) experience nonignorable associations and, in addition and most importantly, have missing entries as it is rather typical for repeated observations in longitudinal studies. The modelling also incorporates potential covariates. This is illustrated using data from the Muscatine Coronary Risk Factor Study. The contribution by Grün and Leisch (2007) introduces the R-package flexmixwhich provides flexible modelling of finite mixtures of regression models using the EM algorithm. Alfò et al. (2007) consider a semiparametric mixture model for detecting differentially expressed genes in microarray experiments.An important goal of microarray studies is the detection of genes that show significant changes in observed expressions when two or more classes of biological samples (e.g. treatment and control) are compared. With the c-fold rule a gene is declared to be differentially expressed if its average expression level varies by more than a constant (typically 2). Instead, Alfò et al. (2007) introduce a gene-specific random term to control for both dependence among genes and variability with respect to the probability of yielding a fold change over a threshold c. Likelihood based inference is accomplished with a two-level finite mixture model while nonparametric Bayesian estimation is performed through the counting distribution of exceedances. Mixtures-of-experts models (Jacobs et al., 1991) and their generalization, hierarchical mixtures-of-expert models (Jordan and Jacobs, 1994) have been introduced to account for nonlinearities and other complexities in the data.

[1]  Carey E. Priebe,et al.  Segmenting magnetic resonance images via hierarchical mixture modelling , 2006, Comput. Stat. Data Anal..

[2]  Alan Agresti,et al.  Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies , 2004, Comput. Stat. Data Anal..

[3]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[4]  Werner Hürlimann,et al.  Fitting Bivariate Cumulative Returns with Copulas , 2002, Comput. Stat. Data Anal..

[5]  Marco Di Zio,et al.  Imputation through finite Gaussian mixture models , 2007, Comput. Stat. Data Anal..

[6]  M. A. Ismail,et al.  Mixture of two inverse Weibull distributions: Properties and estimation , 2007, Comput. Stat. Data Anal..

[7]  Laurent Bordes,et al.  A stochastic EM algorithm for a semiparametric mixture model , 2007, Comput. Stat. Data Anal..

[8]  A. Chao Estimating the population size for capture-recapture data with unequal catchability. , 1987, Biometrics.

[9]  Hongyuan Zha,et al.  Computational Statistics Data Analysis , 2021 .

[10]  Lynette A. Hunt,et al.  Mixture model clustering for mixed data with missing information , 2003, Comput. Stat. Data Anal..

[11]  L Knorr-Held,et al.  Bayesian Detection of Clusters and Discontinuities in Disease Maps , 2000, Biometrics.

[12]  María José García-Zattera,et al.  A Dirichlet process mixture model for the analysis of correlated binary responses , 2007, Comput. Stat. Data Anal..

[13]  Peter Congdon A model for non-parametric spatially varying regression effects , 2006, Comput. Stat. Data Anal..

[14]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[15]  Dankmar Böhning,et al.  Asymptotic Normality in Mixtures of Power Series Distributions , 2005 .

[16]  Wilfried Seidel,et al.  Editorial: recent developments in mixture models , 2003, Comput. Stat. Data Anal..

[17]  Byron J. T. Morgan,et al.  Integrated squared error estimation of normal mixtures , 2004, Comput. Stat. Data Anal..

[18]  Sam Yuan Sung,et al.  A hybrid EM approach to spatial clustering , 2006, Comput. Stat. Data Anal..

[19]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[20]  Simos G. Meintanis,et al.  Tests for normal mixtures based on the empirical characteristic function , 2005, Comput. Stat. Data Anal..

[21]  Paul H. C. Eilers,et al.  Non-parametric log-concave mixtures , 2007, Comput. Stat. Data Anal..

[22]  Naonori Ueda,et al.  Bayesian model search for mixture models based on optimizing variational bounds , 2002, Neural Networks.

[23]  Hana Sevcikova,et al.  Efficient calculation of the NPMLE of a mixing distribution for mixtures of exponentials , 2006, Comput. Stat. Data Anal..

[24]  Alessio Farcomeni,et al.  Robust semiparametric mixing for detecting differentially expressed genes in microarray experiments , 2007, Comput. Stat. Data Anal..

[25]  Martin A. Tanner,et al.  Modelling nonlinear count time series with local mixtures of Poisson autoregressions , 2007, Comput. Stat. Data Anal..

[26]  Lynette A. Hunt,et al.  Fitting a Mixture Model to Three-mode Three-way Data with Missing Information , 2001, J. Classif..

[27]  A. Durio E. D. Isaia,et al.  A quick procedure for model selection in the case of mixture of normal densities , 2007, Comput. Stat. Data Anal..

[28]  R. Hathaway A Constrained Formulation of Maximum-Likelihood Estimation for Normal Mixture Distributions , 1985 .

[29]  Jeroen K. Vermunt,et al.  7. Multilevel Latent Class Models , 2003 .

[30]  Dankmar Böhning,et al.  Computer-Assisted Analysis of Mixtures and Applications: Meta-Analysis, Disease Mapping, and Others , 1999 .

[31]  Geoffrey J. McLachlan,et al.  Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution , 2007, Comput. Stat. Data Anal..

[32]  D. Böhning,et al.  Nonparametric maximum likelihood estimation of population size based on the counting distribution , 2005 .

[33]  K. Krishnamoorthy,et al.  Computing discrete mixtures of continuous distributions: noncentral chisquare, noncentral t , 2003, Comput. Stat. Data Anal..

[34]  Lynette A. Hunt,et al.  Fitting a Mixture Model to Three-Mode Three-Way Data with Categorical and Continuous Variables , 1999 .

[35]  Dankmar Böhning,et al.  Mixture models for capture-recapture count data , 2005, Stat. Methods Appl..

[36]  H Christopher Frey,et al.  Quantification of Variability and Uncertainty Using Mixture Distributions: Evaluation of Sample Size, Mixing Weights, and Separation Between Components , 2004, Risk analysis : an official publication of the Society for Risk Analysis.

[37]  Iven Van Mechelen,et al.  Constrained Latent Class Analysis of Three-Way Three-Mode Data , 2002, J. Classif..

[38]  Chang Xuan Mao,et al.  Estimating population sizes for capture-recapture sampling with binomial mixtures , 2007, Comput. Stat. Data Anal..

[39]  Stephan Bohacek,et al.  A diffusion model of roundtrip time , 2004, Comput. Stat. Data Anal..

[40]  D. M. Titterington,et al.  Variational approximations in Bayesian model selection for finite mixture distributions , 2007, Comput. Stat. Data Anal..

[41]  Peter M. Steiner,et al.  Classification of large data sets with mixture models via sufficient EM , 2007, Comput. Stat. Data Anal..

[42]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[43]  Jeroen K. Vermunt A hierarchical mixture model for clustering three-way data sets , 2007, Comput. Stat. Data Anal..

[44]  Bernard Garel,et al.  Recent asymptotic results in testing for mixtures , 2007, Comput. Stat. Data Anal..

[45]  Dimitris Karlis,et al.  Confidence intervals of the hazard rate function for discrete distributions using mixtures , 2007, Comput. Stat. Data Anal..

[46]  Brian Caffo,et al.  Flexible random intercept models for binary outcomes using mixtures of normals , 2007, Comput. Stat. Data Anal..

[47]  Francesco Bartolucci,et al.  The use of mixtures for dealing with non-normal regression errors , 2004, Comput. Stat. Data Anal..

[48]  Adrian Corduneanu,et al.  Variational Bayesian Model Selection for Mixture Distributions , 2001 .

[49]  Geoffrey J. McLachlan,et al.  Modelling high-dimensional data by mixtures of factor analyzers , 2003, Comput. Stat. Data Anal..

[50]  Saralees Nadarajah Information matrices for Laplace and Pareto mixtures , 2006, Comput. Stat. Data Anal..

[51]  Carlos J. Perez,et al.  Bayesian analysis of finite mixtures of multinomial and negative-multinomial distributions , 2007, Comput. Stat. Data Anal..

[52]  W. Link Nonidentifiability of Population Size from Capture‐Recapture Data with Heterogeneous Detection Probabilities , 2003, Biometrics.

[53]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[54]  C. Robert,et al.  Deviance information criteria for missing data models , 2006 .

[55]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[56]  A. Yashin,et al.  Correlated individual frailty: an advantageous approach to survival analysis of bivariate data. , 1995, Mathematical population studies.

[57]  Hans C van Houwelingen,et al.  Point and interval estimation of the population size using the truncated Poisson regression model , 2003 .

[58]  Sylvia Frühwirth-Schnatter,et al.  Finite Mixture and Markov Switching Models , 2006 .

[59]  Friedrich Leisch,et al.  Fitting finite mixtures of generalized linear regressions in R , 2007, Comput. Stat. Data Anal..

[60]  Maurizio Vichi,et al.  A mixture model for the classification of three-way proximity data , 2006, Comput. Stat. Data Anal..

[61]  Angela D'Elia,et al.  A mixture model for preferences data analysis , 2005, Comput. Stat. Data Anal..

[62]  Anton K. Formann Mixture analysis of multivariate categorical data with covariates and missing entries , 2007, Comput. Stat. Data Anal..

[63]  Ning-Zhong Shi,et al.  Drug risk assessment with determining the number of sub-populations under finite mixture normal models , 2004, Comput. Stat. Data Anal..

[64]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[65]  Hedibert Freitas Lopes,et al.  Data driven estimates for mixtures , 2004, Comput. Stat. Data Anal..

[66]  Dankmar Böhning,et al.  Computer-Assisted Analysis of Mixtures and Applications , 2000, Technometrics.

[67]  Udi E. Makov,et al.  Robustness via a mixture of exponential power distributions , 2003, Comput. Stat. Data Anal..

[68]  Salvatore Ingrassia,et al.  Constrained monotone EM algorithms for finite mixture of multivariate Gaussians , 2007, Comput. Stat. Data Anal..

[69]  Nagaraj K. Neerchal,et al.  An improved method for the computation of maximum likeliood estimates for multinomial overdispersion models , 2005, Comput. Stat. Data Anal..

[70]  Mortaza Jamshidian,et al.  On Algorithms for Restricted Maximum Likelihood Estimation , 2002, Comput. Stat. Data Anal..

[71]  Stephan R. Sain,et al.  Testing for outliers from a mixture distribution when some data are missing , 2003, Comput. Stat. Data Anal..

[72]  Peter G M van der Heijden,et al.  Point and Interval Estimation of the Population Size Using a Zero‐Truncated Negative Binomial Regression Model , 2008, Biometrical journal. Biometrische Zeitschrift.

[73]  Angela Montanari,et al.  Independent factor discriminant analysis , 2008, Comput. Stat. Data Anal..

[74]  P Schlattmann,et al.  Space-time mixture modelling of public health data. , 2000, Statistics in medicine.

[75]  Dankmar Böhning,et al.  Equivalence of Truncated Count Mixture Distributions and Mixtures of Truncated Count Distributions , 2006, Biometrics.

[76]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[77]  Mohammad Reza Meshkani,et al.  Bayesian analysis of an inverse Gaussian correlated frailty model , 2007, Comput. Stat. Data Anal..