Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions.

Skew-normal and skew-t distributions have proved to be useful for capturing skewness and kurtosis in data directly without transformation. Recently, finite mixtures of such distributions have been considered as a more general tool for handling heterogeneous data involving asymmetric behaviors across subpopulations. We consider such mixture models for both univariate as well as multivariate data. This allows robust modeling of high-dimensional multimodal and asymmetric data generated by popular biotechnological platforms such as flow cytometry. We develop Bayesian inference based on data augmentation and Markov chain Monte Carlo (MCMC) sampling. In addition to the latent allocations, data augmentation is based on a stochastic representation of the skew-normal distribution in terms of a random-effects model with truncated normal random effects. For finite mixtures of skew normals, this leads to a Gibbs sampling scheme that draws from standard densities only. This MCMC scheme is extended to mixtures of skew-t distributions based on representing the skew-t distribution as a scale mixture of skew normals. As an important application of our new method, we demonstrate how it provides a new computational framework for automated analysis of high-dimensional flow cytometric data. Using multivariate skew-normal and skew-t mixture models, we could model non-Gaussian cell populations rigorously and directly without transformation or projection to lower dimensions.

[1]  A. Azzalini A class of distributions which includes the normal ones , 1985 .

[2]  N. Henze A Probabilistic Representation of the 'Skew-normal' Distribution , 1986 .

[3]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[4]  C. Robert,et al.  Estimation of Finite Mixture Distributions Through Bayesian Sampling , 1994 .

[5]  L Kruglyak,et al.  A nonparametric approach for mapping quantitative trait loci. , 1995, Genetics.

[6]  A. Azzalini,et al.  The multivariate skew-normal distribution , 1996 .

[7]  Xiao-Li Meng,et al.  SIMULATING RATIOS OF NORMALIZING CONSTANTS VIA A SIMPLE IDENTITY: A THEORETICAL EXPLORATION , 1996 .

[8]  P. Green,et al.  Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .

[9]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[10]  Allen D. Roses,et al.  A model for susceptibility polymorphisms for complex diseases: apolipoprotein E and Alzheimer disease , 1997, Neurogenetics.

[11]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Geoffrey J. McLachlan,et al.  Robust mixture modelling using the t distribution , 2000, Stat. Comput..

[13]  M. Stephens Dealing with label switching in mixture models , 2000 .

[14]  C. Robert,et al.  Computational and Inferential Difficulties with Mixture Posterior Distributions , 2000 .

[15]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[16]  Xiao-Li Meng,et al.  The Art of Data Augmentation , 2001 .

[17]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[18]  D. Dey,et al.  A General Class of Multivariate Skew-Elliptical Distributions , 2001 .

[19]  S. Frühwirth-Schnatter Markov chain Monte Carlo Estimation of Classical and Dynamic Switching and Mixture Models , 2001 .

[20]  W. Wong,et al.  Real-Parameter Evolutionary Monte Carlo With Applications to Bayesian Mixture Models , 2001 .

[21]  David A Bennett,et al.  The apolipoprotein E epsilon 4 allele and decline in different cognitive systems during a 6-year period. , 2002, Archives of neurology.

[22]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[23]  A. Azzalini,et al.  Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t‐distribution , 2003, 0911.2342.

[24]  P. Chattopadhyay,et al.  Seventeen-colour flow cytometry: unravelling the immune system , 2004, Nature Reviews Immunology.

[25]  Jack C. Lee,et al.  Bayesian analysis of mixture modelling using the multivariate t distribution , 2004, Stat. Comput..

[26]  S. Frühwirth-Schnatter Estimating Marginal Likelihoods for Mixture and Markov Switching Models Using Bridge Sampling Techniques , 2004 .

[27]  Marc G. Genton,et al.  Skew-elliptical distributions and their applications : a journey beyond normality , 2004 .

[28]  M. Adam,et al.  Bayesian mixture modelling of species divergence , 2004 .

[29]  C. Holmes,et al.  MCMC and the Label Switching Problem in Bayesian Mixture Modelling 1 Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modelling , 2004 .

[30]  Agostino Nobile,et al.  On the posterior distribution of the number of components in a finite mixture , 2004, math/0503673.

[31]  D. Bennett,et al.  Religious Orders Study: Overview and Change in Cognitive and Motor Speed , 2004 .

[32]  Ajay Jasra,et al.  Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling , 2005 .

[33]  David A. Bennett,et al.  The Rush Memory and Aging Project: Study Design and Baseline Characteristics of the Study Cohort , 2005, Neuroepidemiology.

[34]  Sylvia Frühwirth-Schnatter,et al.  Finite Mixture and Markov Switching Models , 2006 .

[35]  Ajay Jasra,et al.  Bayesian Mixture Modelling in Geochronology via Markov Chain Monte Carlo , 2006 .

[36]  C. Robert,et al.  Deviance information criteria for missing data models , 2006 .

[37]  R. Arellano-Valle,et al.  On the Unification of Families of Skew‐normal Distributions , 2006 .

[38]  Petros Dellaportas,et al.  Multivariate mixtures of normals with unknown number of components , 2006, Stat. Comput..

[39]  Jack C. Lee,et al.  Robust mixture modeling using the skew t distribution , 2007, Stat. Comput..

[40]  Tsung-I Lin,et al.  Finite mixture modelling using the skew normal distribution , 2007 .

[41]  R. Brinkman,et al.  High-content flow cytometry and temporal data analysis for defining a cellular signature of graft-versus-host disease. , 2007, Biology of blood and marrow transplantation : journal of the American Society for Blood and Marrow Transplantation.

[42]  Raphael Gottardo,et al.  Automated gating of flow cytometry data via robust model‐based clustering , 2008, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[43]  Cliburn Chan,et al.  Statistical mixture modeling for cell subtype identification in flow cytometry , 2008, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[44]  John Ferbas,et al.  Mixture modeling approach to flow cytometry data , 2008, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[45]  Heleno Bolfarine,et al.  Bayesian density estimation using skew student-t-normal mixtures , 2008, Comput. Stat. Data Anal..

[46]  Tsung I. Lin,et al.  Maximum likelihood estimation for multivariate skew normal mixture models , 2009, J. Multivar. Anal..

[47]  Tsung I. Lin,et al.  Robust mixture modeling using multivariate skew t distributions , 2010, Stat. Comput..

[48]  Miguel A. Juárez,et al.  Model-Based Clustering of Non-Gaussian Panel Data Based on Skew-t Distributions , 2010 .

[49]  Jill P. Mesirov,et al.  Automated High-Dimensional Flow Cytometric Data Analysis , 2010, RECOMB.