Kernel learning approaches for summarising and combining posterior similarity matrices

When using Markov chain Monte Carlo (MCMC) algorithms to perform inference for Bayesian clustering models, such as mixture models, the output is typically a sample of clusterings (partitions) drawn from the posterior distribution. In practice, a key challenge is how to summarise this output. Here we build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models. A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices that capture the clustering structure present in the data. This observation enables us to employ a range of kernel methods to obtain summary clusterings, and otherwise exploit the information summarised by PSMs. For example, if we have multiple PSMs, each corresponding to a different dataset on a common set of statistical units, we may use standard methods for combining kernels in order to perform integrative clustering. We may moreover embed PSMs within predictive kernel models in order to perform outcome-guided data integration. We demonstrate the performances of the proposed methods through a range of simulation studies as well as two real data applications. R code is available at this https URL.

[1]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[2]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[3]  Benjamin J. Raphael,et al.  Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin , 2014, Cell.

[4]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[5]  Zoubin Ghahramani,et al.  Bayesian Cluster Analysis: Point Estimation and Credible Balls (with Discussion) , 2015, Bayesian Analysis.

[6]  D. Dunson,et al.  Nonparametric Bayes Conditional Distribution Modeling With Variable Selection , 2009, Journal of the American Statistical Association.

[7]  Zoubin Ghahramani,et al.  Bayesian correlated clustering to integrate multiple datasets , 2012, Bioinform..

[8]  M. Cugmas,et al.  On comparing partitions , 2015 .

[9]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[10]  Zoubin Ghahramani,et al.  Discovering transcriptional modules by Bayesian data integration , 2010, Bioinform..

[11]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[12]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[13]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[14]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[15]  C. Robert,et al.  Computational and Inferential Difficulties with Mixture Posterior Distributions , 2000 .

[16]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[17]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[18]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[19]  S. Dudoit,et al.  A prediction-based resampling method for estimating the number of clusters in a dataset , 2002, Genome Biology.

[20]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[21]  K. Ickstadt,et al.  Improved criteria for clustering based on the posterior similarity matrix , 2009 .

[22]  Gérard Dreyfus,et al.  Single-layer learning revisited: a stepwise procedure for building and training a neural network , 1989, NATO Neurocomputing.

[23]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[24]  Mario Medvedovic,et al.  Bayesian infinite mixture model based clustering of gene expression profiles , 2002, Bioinform..

[25]  Matthew E Ritchie,et al.  High-resolution transcription atlas of the mitotic cell cycle in budding yeast , 2010, Genome Biology.

[26]  Paul D W Kirk,et al.  Multiple kernel learning for integrative consensus clustering of omic datasets , 2020, Bioinform..

[27]  Christoph Bock,et al.  Transcriptional, epigenetic and metabolic signatures in cardiometabolic syndrome defined by extreme phenotypes , 2020, bioRxiv.

[28]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[29]  P. Müller,et al.  Bayesian inference for gene expression and proteomics , 2006 .

[30]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[31]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[32]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[33]  Mehmet Gönen,et al.  Localized Data Fusion for Kernel k-Means Clustering with Application to Cancer Biology , 2014, NIPS.

[34]  Phillipp Kaestner,et al.  Linear And Nonlinear Programming , 2016 .

[35]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[36]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[37]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[38]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[39]  Roger E Bumgarner,et al.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. , 2001, Science.

[40]  Sylvia Richardson,et al.  Bayesian profile regression with an application to the National Survey of Children's Health. , 2010, Biostatistics.

[41]  P. Green,et al.  Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .

[42]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[43]  Sylvia Richardson,et al.  PReMiuM: An R Package for Profile Regression Mixture Models Using Dirichlet Processes. , 2013, Journal of statistical software.

[44]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[45]  Alessandra Cabassi,et al.  Two-step penalised logistic regression for multi-omic data with an application to cardiometabolic syndrome , 2020, 2008.00235.

[46]  Paul D. W. Kirk,et al.  MDI-GPU: accelerating integrative modelling for genomic-scale data using GP-GPU computing , 2016, Statistical applications in genetics and molecular biology.

[47]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[48]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[49]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[50]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[51]  Volker Roth,et al.  Nonlinear Discriminant Analysis Using Kernel Functions , 1999, NIPS.

[52]  Paul Pavlidis,et al.  Gene Ontology term overlap as a measure of gene functional similarity , 2008, BMC Bioinformatics.

[53]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[54]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[55]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[56]  D. Binder Bayesian cluster analysis , 1978 .

[57]  P. Deb Finite Mixture Models , 2008 .

[58]  Dootika Vats,et al.  Revisiting the Gelman–Rubin Diagnostic , 2018, Statistical Science.