Infinite Mixtures of Infinite Factor Analysers

Factor-analytic Gaussian mixture models are often employed as a model-based approach to clustering high-dimensional data. Typically, the numbers of clusters and latent factors must be specified in advance of model fitting, and remain fixed. The pair which optimises some model selection criterion is then chosen. For computational reasons, models in which the number of latent factors differ across clusters are rarely considered. Here the infinite mixture of infinite factor analysers (IMIFA) model is introduced. IMIFA employs a Pitman-Yor process prior to facilitate automatic inference of the number of clusters using the stick-breaking construction and a slice sampler. Furthermore, IMIFA employs multiplicative gamma process shrinkage priors to allow cluster-specific numbers of factors, automatically inferred via an adaptive Gibbs sampler. IMIFA is presented as the flagship of a family of factor-analytic mixture models, providing flexible approaches to clustering high-dimensional data. Applications to a benchmark data set, metabolomic spectral data, and a manifold learning handwritten digit example illustrate the IMIFA model and its advantageous features: IMIFA obviates the need for model selection criteria, reduces the computational burden associated with the search of the model space, improves clustering performance by allowing cluster-specific numbers of factors, and quantifies uncertainty in the numbers of clusters and cluster-specific factors.

[1]  Isobel Claire Gormley,et al.  Probabilistic principal component analysis for metabolomic data , 2010, BMC Bioinformatics.

[2]  E. George,et al.  Fast Bayesian Factor Analysis via Automatic Rotations to Sparsity , 2016 .

[3]  T. N. Sriram,et al.  Robust Estimation of Mixture Complexity , 2006 .

[4]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[5]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[6]  J. Pitman,et al.  Size-biased sampling of Poisson point processes and excursions , 1992 .

[7]  David B. Dunson,et al.  Scalable geometric density estimation , 2016, AISTATS.

[8]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[9]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[10]  Cinzia Viroli,et al.  Dimensionally Reduced Model-Based Clustering Through Mixtures of Factor Mixture Analyzers , 2010, J. Classif..

[11]  J. Pitman Random discrete distributions invariant under size-biased permutation , 1996, Advances in Applied Probability.

[12]  J. Pitman Exchangeable and partially exchangeable random partitions , 1995 .

[13]  Luca Scrucca,et al.  mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models , 2016, R J..

[14]  Damien McParland,et al.  CLUSTERING SOUTH AFRICAN HOUSEHOLDS BASED ON THEIR ASSET STATUS USING LATENT VARIABLE MODELS. , 2014, The annals of applied statistics.

[15]  Christian Carmona,et al.  Model-based approach for household clustering with mixed scale variables , 2016, Advances in Data Analysis and Classification.

[16]  Daniele Durante,et al.  A note on the multiplicative gamma process , 2016, 1610.03408.

[17]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[18]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[19]  H. Lopes,et al.  Sparse Bayesian Factor Analysis When the Number of Factors Is Unknown , 2018, Bayesian Analysis.

[20]  Panagiotis Papastamoulis,et al.  Overfitting Bayesian mixtures of factor analyzers with an unknown number of components , 2017, Comput. Stat. Data Anal..

[21]  J. Yellott The relationship between Luce's Choice Axiom, Thurstone's Theory of Comparative Judgment, and the double exponential distribution , 1977 .

[22]  Zoubin Ghahramani,et al.  Infinite Sparse Factor Analysis and Infinite Independent Components Analysis , 2007, ICA.

[23]  Matthew T. Harrison,et al.  A simple example of Dirichlet process mixture inconsistency for the number of components , 2013, NIPS.

[24]  Juhee Lee,et al.  Inference functions in high dimensional Bayesian inference , 2014 .

[25]  Geoffrey J. McLachlan,et al.  Mixtures of Factor Analyzers with Common Factor Loadings: Applications to the Clustering and Visualization of High-Dimensional Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Marina Vannucci,et al.  Variable selection in clustering via Dirichlet process mixture models , 2006 .

[27]  Isobel Claire Gormley,et al.  A dynamic probabilistic principal components model for the analysis of longitudinal metabolomics data , 2013, 1312.2393.

[28]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[29]  David B. Dunson,et al.  Compressive Sensing on Manifolds Using a Nonparametric Mixture of Factor Analyzers: Algorithm and Performance Bounds , 2010, IEEE Transactions on Signal Processing.

[30]  Matthew T. Harrison,et al.  Inconsistency of Pitman-Yor process mixtures for the number of components , 2013, J. Mach. Learn. Res..

[31]  Lixing Zhu,et al.  SHRINKAGE ESTIMATION OF LARGE DIMENSIONAL PRECISION MATRIX USING RANDOM MATRIX THEORY , 2015 .

[32]  D. M. Titterington,et al.  Mixtures of Factor Analysers. Bayesian Estimation and Inference by Stochastic Simulation , 2004, Machine Learning.

[33]  Gertraud Malsiner-Walli,et al.  From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering , 2018, Advances in Data Analysis and Classification.

[34]  Christian Aßmann,et al.  Bayesian analysis of dynamic factor models: An ex-post approach towards the rotation problem , 2014 .

[35]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[36]  Silvia Lanteri,et al.  Classification of olive oils from their fatty acid composition , 1983 .

[37]  M. Escobar Estimating Normal Means with a Dirichlet Process Prior , 1994 .

[38]  K. Mengersen,et al.  Asymptotic behaviour of the posterior distribution in overfitted mixture models , 2011 .

[39]  M. Cugmas,et al.  On comparing partitions , 2015 .

[40]  Christian Hennig,et al.  Methods for merging Gaussian mixture components , 2010, Adv. Data Anal. Classif..

[41]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[42]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[43]  Matteo Ruggiero,et al.  Are Gibbs-Type Priors the Most Natural Generalization of the Dirichlet Process? , 2015, IEEE transactions on pattern analysis and machine intelligence.

[44]  P. Green,et al.  Modelling Heterogeneity With and Without the Dirichlet Process , 2001 .

[45]  David B Dunson,et al.  Default Prior Distributions and Efficient Posterior Computation in Bayesian Factor Analysis , 2009, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[46]  Paul D. McNicholas,et al.  Model-based clustering of microarray expression data via latent Gaussian mixture models , 2010, Bioinform..

[47]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[48]  Sylvia Richardson,et al.  Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations , 2013, Statistics and Computing.

[49]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[50]  M. West,et al.  High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics , 2008, Journal of the American Statistical Association.

[51]  Lorraine Brennan,et al.  Effects of pentylenetetrazole-induced seizures on metabolomic profiles of rat brain , 2010, Neurochemistry International.

[52]  Geoffrey J. McLachlan,et al.  Robust mixture modelling using the t distribution , 2000, Stat. Comput..

[53]  M. West,et al.  Hyperparameter estimation in Dirichlet process mixture models , 1992 .

[54]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[55]  Michael A. West,et al.  Hierarchical priors and mixture models, with applications in regression and density estimation , 2006 .

[56]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[57]  Gertraud Malsiner-Walli,et al.  Model-based clustering based on sparse finite Gaussian mixtures , 2014, Statistics and Computing.

[58]  R. A. van den Berg,et al.  Centering, scaling, and transformations: improving the biological information content of metabolomics data , 2006, BMC Genomics.

[59]  Sylvia Frühwirth-Schnatter,et al.  Dealing with Label Switching under Model Uncertainty , 2011 .

[60]  Kamel Jedidi,et al.  Heterogeneous factor analysis models: A bayesian approach , 2002 .

[61]  Hedibert Freitas Lopes,et al.  Parsimonious Bayesian Factor Analysis when the Number of Factors is Unknown , 2010 .

[62]  T. Fearn Probabilistic Principal Component Analysis , 2014 .

[63]  Yee Whye Teh,et al.  Bayesian multi-population haplotype inference via a hierarchical dirichlet process mixture , 2006, ICML.

[64]  Murat Dundar,et al.  The Infinite Mixture of Infinite Gaussian Mixtures , 2014, NIPS.

[65]  Zoubin Ghahramani,et al.  Nonparametric Bayesian Sparse Factor Models with application to Gene Expression modelling , 2010, The Annals of Applied Statistics.

[66]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[67]  Stephen G. Walker,et al.  Univariate Bayesian nonparametric mixture modeling with unimodal kernels , 2014, Stat. Comput..

[68]  Judith Rousseau,et al.  Overfitting Bayesian Mixture Models with an Unknown Number of Components , 2015, PloS one.

[69]  Cinzia Viroli,et al.  Finite mixtures of matrix normal distributions for classifying three-way data , 2011, Stat. Comput..

[70]  C. Robert,et al.  Estimation of Finite Mixture Distributions Through Bayesian Sampling , 1994 .

[71]  Lawrence Carin,et al.  Nonparametric factor analysis with beta process priors , 2009, ICML '09.

[72]  M. Newton,et al.  Estimating the Integrated Likelihood via Posterior Simulation Using the Harmonic Mean Identity , 2006 .

[73]  S. MacEachern,et al.  Bayesian Density Estimation and Inference Using Mixtures , 2007 .

[74]  Stephen G. Walker,et al.  Slice sampling mixture models , 2011, Stat. Comput..

[75]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[76]  Lancelot F. James,et al.  Bayesian Model Selection in Finite Mixtures by Marginal Density Decompositions , 2001 .

[77]  Paul D. McNicholas,et al.  Parsimonious Gaussian mixture models , 2008, Stat. Comput..

[78]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[79]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[80]  Riten Mitra,et al.  Bayesian Nonparametric Inference - Why and How. , 2013, Bayesian analysis.

[81]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[82]  D. Dunson,et al.  Sparse Bayesian infinite factor models. , 2011, Biometrika.

[83]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[84]  Stan Lipovetsky,et al.  Latent Variable Models and Factor Analysis , 2001, Technometrics.

[85]  Sylvia Frühwirth-Schnatter,et al.  Finite Mixture and Markov Switching Models , 2006 .

[86]  N. G. Best,et al.  The deviance information criterion: 12 years on , 2014 .

[87]  Stephen G. Walker,et al.  Sampling the Dirichlet Mixture Model with Slices , 2006, Commun. Stat. Simul. Comput..

[88]  D. Aldous Exchangeability and related topics , 1985 .

[89]  M. Stephens Bayesian analysis of mixture models with an unknown number of components- an alternative to reversible jump methods , 2000 .

[90]  Maria De Iorio,et al.  Bayesian semiparametric inference for multivariate doubly-interval-censored data , 2010, 1101.1415.

[91]  Kunpeng Li,et al.  STATISTICAL ANALYSIS OF FACTOR MODELS OF HIGH DIMENSION , 2012, 1205.6617.

[92]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[93]  P. McNicholas Model-based classification using latent Gaussian mixture models , 2010 .

[94]  Jun Yan,et al.  Gaussian Markov Random Fields: Theory and Applications , 2006 .

[95]  David B. Dunson,et al.  Robust Bayesian Inference via Coarsening , 2015, Journal of the American Statistical Association.

[96]  G. Roberts,et al.  Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models , 2007, 0710.4228.

[97]  Charles Bouveyron,et al.  Model-based clustering of high-dimensional data: A review , 2014, Comput. Stat. Data Anal..