A new R package for Bayesian estimation of multivariate normal mixtures allowing for selection of the number of components and interval-censored data

An R package mixAK is introduced which implements routines for a semiparametric density estimation through normal mixtures using the Markov chain Monte Carlo (MCMC) methodology. Besides producing the MCMC output, the package computes posterior summary statistics for important characteristics of the fitted distribution or computes and visualizes the posterior predictive density. For the estimated models, penalized expected deviance (PED) and deviance information criterion (DIC) is directly computed which allows for a selection of mixture components. Additionally, multivariate right-, left- and interval-censored observations are allowed. For univariate problems, the reversible jump MCMC algorithm has been implemented and can be used for a joint estimation of the mixture parameters and the number of mixture components. The core MCMC routines have been implemented in C++ and linked to R to ensure a reasonable computational speed. We briefly review the implemented algorithms and illustrate the use of the package on three real examples of different complexity.

[1]  D. N. Geary Mixture Models: Inference and Applications to Clustering , 1989 .

[2]  O. Cappé,et al.  Reversible jump, birth‐and‐death and more general continuous time Markov chain Monte Carlo samplers , 2003 .

[3]  Ajay Jasra,et al.  Population-Based Reversible Jump Markov Chain Monte Carlo , 2007, 0711.0186.

[4]  Luc Martens,et al.  The Signal Tandmobiel (r) Project: a longitudinal intervention oral health promotion study in Flanders (Belgium) baseline and first year results. , 2000 .

[5]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[6]  C. Robert,et al.  Estimation of Finite Mixture Distributions Through Bayesian Sampling , 1994 .

[7]  P. Green,et al.  Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .

[8]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[9]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[10]  A. Jasra,et al.  Population-based reversible jump Markov chain , 2005 .

[11]  M. Plummer Penalized loss functions for Bayesian model comparison. , 2008, Biostatistics.

[12]  Scott A. Sisson,et al.  Transdimensional Markov Chains , 2005 .

[13]  J. Christiansen,et al.  Time and duration of eruption of first and second permanent molars: a longitudinal investigation. , 2003, Community dentistry and oral epidemiology.

[14]  K. Roeder Density estimation with confidence sets exemplified by superclusters and voids in the galaxies , 1990 .

[15]  Petros Dellaportas,et al.  Multivariate mixtures of normals with unknown number of components , 2006, Stat. Comput..

[16]  Marco Alfò,et al.  Advances in Mixture Models , 2007, Comput. Stat. Data Anal..

[17]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[18]  C. Robert,et al.  Computational and Inferential Difficulties with Mixture Posterior Distributions , 2000 .

[19]  W. Wong,et al.  Real-Parameter Evolutionary Monte Carlo With Applications to Bayesian Mixture Models , 2001 .

[20]  M. Stephens Bayesian analysis of mixture models with an unknown number of components- an alternative to reversible jump methods , 2000 .

[21]  David J. Lunn,et al.  Generic reversible jump MCMC using graphical models , 2009, Stat. Comput..

[22]  M. Stephens Dealing with label switching in mixture models , 2000 .

[23]  C. Robert,et al.  Deviance information criteria for missing data models , 2006 .

[24]  Ajay Jasra,et al.  Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling , 2005 .

[25]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[26]  Radford M. Neal Sampling from multimodal distributions using tempered transitions , 1996, Stat. Comput..

[27]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[28]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[29]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[30]  John Geweke,et al.  Efficient Simulation from the Multivariate Normal and Student-t Distributions Subject to Linear Constraints and the Evaluation of Constraint Probabilities , 1991 .