Bayesian Hierarchical Model for Large-Scale Covariance Matrix Estimation

Many bioinformatics problems implicitly depend on estimating large-scale covariance matrix. The traditional approaches tend to give rise to high variance and low accuracy due to "overfitting." We cast the large-scale covariance matrix estimation problem into the Bayesian hierarchical model framework, and introduce dependency between covariance parameters. We demonstrate the advantages of our approaches over the traditional approaches using simulations and OMICS data analysis.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  J. Urbanik,et al.  Kansas City , 1896, Journal of the National Medical Association.

[3]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[4]  F. Young Biochemistry , 1955, The Indian Medical Gazette.

[5]  W. Wong,et al.  Transitive functional annotation by shortest-path analysis of gene expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Alfred O. Hero,et al.  Network constrained clustering for gene microarray data , 2005, Bioinform..

[7]  Alberto de la Fuente,et al.  Discovery of meaningful associations in genomic data using partial correlation coefficients , 2004, Bioinform..

[8]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[9]  Y. Benjamini,et al.  False Discovery Rate–Adjusted Multiple Confidence Intervals for Selected Parameters , 2005 .

[10]  Alfred O. Hero,et al.  Network constrained clustering for gene microarray data , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  Trey Ideker,et al.  Testing for Differentially-Expressed Genes by Maximum-Likelihood Analysis of Microarray Data , 2000, J. Comput. Biol..

[12]  Quan J. Wang,et al.  A Bayesian joint probability modeling approach for seasonal forecasting of streamflows at multiple sites , 2009 .

[13]  H. Hotelling New Light on the Correlation Coefficient and its Transforms , 1953 .

[14]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[15]  P. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[16]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[17]  C. Hollenberg,et al.  Concurrent knock‐out of at least 20 transporter genes is required to block uptake of hexoses in Saccharomyces cerevisiae , 1999, FEBS letters.

[18]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[19]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[20]  Alfred O. Hero,et al.  High Throughput Screening of Co-Expressed Gene Pairs with Controlled False Discovery Rate (FDR) and Minimum Acceptable Strength (MAS) , 2005, J. Comput. Biol..

[21]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[22]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[23]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[24]  M. Gerstein,et al.  Relationship between gene co-expression and probe localization on microarray slides , 2003, BMC Genomics.

[25]  R. Fisher 014: On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample. , 1921 .

[26]  ON THE PROBABLE ERROR OF A COEFFICIENT OF CONTINGENCY WITHOUT APPROXIMATION , 1916 .

[27]  Carolyn Pillers Dobler,et al.  Mathematical Statistics , 2002 .