A comprehensive comparison of association estimators for gene network inference algorithms

MOTIVATION Gene network inference (GNI) algorithms enable the researchers to explore the interactions among the genes and gene products by revealing these interactions. The principal process of the GNI algorithms is to obtain the association scores among genes. Although there are several association estimators used in different applications, there is no commonly accepted estimator as the best one for the GNI applications. In this study, 27 different interaction estimators were reviewed and 14 most promising ones among them were evaluated by using three popular GNI algorithms with two synthetic and two real biological datasets belonging to Escherichia coli bacteria and Saccharomyces cerevisiae yeast. Influences of the Copula Transform (CT) pre-processing operation on the performance of the interaction estimators are also observed. This study is expected to assist many researchers while studying with GNI applications. RESULTS B-spline, Pearson-based Gaussian and Spearman-based Gaussian association score estimators outperform the others for all datasets in terms of the performance and runtime. In addition to this, it is observed that, when the CT operation is used, inference performances of the estimators mostly increase, especially for two synthetic datasets. Detailed evaluations and discussions are given in the experimental results. CONTACT gokmen.altay@bahcesehir.edu.tr SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Carsten O. Daub,et al.  Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data , 2004, BMC Bioinformatics.

[2]  Frank Emmert-Streib,et al.  Inferring the conservative causal core of gene regulatory networks , 2010, BMC Systems Biology.

[3]  A. Kraskov,et al.  Erratum: Estimating mutual information [Phys. Rev. E 69, 066138 (2004)] , 2011 .

[4]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[6]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[7]  G. Altay,et al.  Empirically determining the sample size for large-scale gene network inference algorithms. , 2012, IET systems biology.

[8]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[9]  D. Kugiumtzis,et al.  Evaluation of mutual information estimators on nonlinear dynamic systems , 2008, 0809.2149.

[10]  Kathleen Marchal,et al.  SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms , 2006, BMC Bioinformatics.

[11]  Age K. Smilde,et al.  Metabolic network discovery through reverse engineering of metabolome data , 2009, Metabolomics.

[12]  Alberto de la Fuente,et al.  Discovery of meaningful associations in genomic data using partial correlation coefficients , 2004, Bioinform..

[13]  Julio Collado-Vides,et al.  RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation , 2007, Nucleic Acids Res..

[14]  Frank Emmert-Streib,et al.  Revealing differences in gene network inference algorithms on the network level by ensemble methods , 2010, Bioinform..

[15]  Jeremiah J. Faith,et al.  Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata , 2007, Nucleic Acids Res..

[16]  Frank Emmert-Streib,et al.  Influence of Statistical Estimators of Mutual Information and Data Heterogeneity on the Inference of Gene Regulatory Networks , 2011, PloS one.

[17]  Korbinian Strimmer,et al.  Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks , 2008, J. Mach. Learn. Res..

[18]  Gianluca Bontempi,et al.  On the Impact of Entropy Estimation on Transcriptional Regulatory Network Inference Based on Mutual Information , 2008, EURASIP J. Bioinform. Syst. Biol..

[19]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[20]  Mohammad Asim,et al.  Differential C3NET reveals disease networks of direct physical interactions , 2011, BMC Bioinformatics.

[21]  E. Marcotte,et al.  An Improved, Bias-Reduced Probabilistic Functional Gene Network of Baker's Yeast, Saccharomyces cerevisiae , 2007, PloS one.

[22]  Takafumi Kanamori,et al.  Mutual information estimation reveals global associations between stimuli and biological processes , 2009, BMC Bioinformatics.

[23]  R. Heller,et al.  A consistent multivariate test of association based on ranks of distances , 2012, 1201.3522.

[24]  Oliver Ebenhöh,et al.  Measuring correlations in metabolomic networks with mutual information. , 2008, Genome informatics. International Conference on Genome Informatics.

[25]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.