Prior Information Based Bayesian Infinite Mixture Model

Unsupervised learning methods have been tremendously successful in extracting knowledge from genomics data generated by high throughput experimental assays. However, analysis of each dataset in isolation without incorporating potentially informative prior knowledge is limiting the utility of such procedures. Here we present a novel probabilistic model and computational algorithm for semi-supervised learning from genomics data. The probabilistic model is an extension of the Bayesian semiparametric Gaussian Infinite Mixture Model (GIMM) and training of model parameters is performed using Markov Chain Monte Carl algorithm. The utility of the procedure in improving precision of cluster analysis by incorporating prior information is demonstrated in a simulation study and the analysis of the real world genomics data.

[1]  Wei Pan,et al.  Bioinformatics Original Paper Incorporating Gene Functions as Priors in Model-based Clustering of Microarray Gene Expression Data , 2022 .

[2]  George C Tseng,et al.  Tight Clustering: A Resampling‐Based Approach for Identifying Stable and Tight Patterns in Data , 2005, Biometrics.

[3]  Xin Chen,et al.  The TRANSFAC system on gene expression regulation , 2001, Nucleic Acids Res..

[4]  Wei Pan,et al.  Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data , 2006, Bioinform..

[5]  Simon Kasif,et al.  Hierarchical tree snipping: clustering guided by prior knowledge , 2007, Bioinform..

[6]  Jun S. Liu,et al.  Integrating regulatory motif discovery and genome-wide expression analysis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[8]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[9]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[10]  Rong Jin,et al.  A Novel Method Incorporating Gene Ontology Information for Unsupervised Clustering and Feature Selection , 2008, PloS one.

[11]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[12]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[13]  Clifford A. Meyer,et al.  Genome-wide analysis of estrogen receptor binding sites , 2006, Nature Genetics.

[14]  Alfonso Valencia,et al.  A hierarchical unsupervised growing neural network for clustering gene expression patterns , 2001, Bioinform..

[15]  Zhen Hu,et al.  BMC Bioinformatics BioMed Central Methodology article CLEAN: CLustering Enrichment ANalysis , 2009 .

[16]  S. Batzoglou,et al.  Application of independent component analysis to microarrays , 2003, Genome Biology.

[17]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[18]  P. Hall,et al.  An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[19]  D. Slonim From patterns to pathways: gene expression data analysis comes of age , 2002, Nature Genetics.

[20]  Mario Medvedovic,et al.  Bayesian hierarchical model for transcriptional module discovery by jointly modeling gene expression and ChIP-chip data , 2007, BMC Bioinformatics.

[21]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[22]  Christodoulos A. Floudas,et al.  Microarray data mining: A novel optimization-based approach to uncover biologically coherent structures , 2008, BMC Bioinformatics.

[23]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[24]  I. Simon,et al.  A probabilistic generative model for GO enrichment analysis , 2008, Nucleic acids research.

[25]  Jin Hwan Do,et al.  Clustering approaches to identifying gene expression patterns from DNA microarray data. , 2008, Molecules and cells.

[26]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Mario Medvedovic,et al.  Bayesian infinite mixture model based clustering of gene expression profiles , 2002, Bioinform..

[28]  Wei Pan,et al.  Combining gene annotations and gene expression data in model-based clustering: weighted method. , 2006, Omics : a journal of integrative biology.