Study of Meta-analysis strategies for network inference using information-theoretic approaches

BackgroundReverse engineering of gene regulatory networks (GRNs) from gene expression data is a classical challenge in systems biology. Thanks to high-throughput technologies, a massive amount of gene-expression data has been accumulated in the public repositories. Modelling GRNs from multiple experiments (also called integrative analysis) has; therefore, naturally become a standard procedure in modern computational biology. Indeed, such analysis is usually more robust than the traditional approaches, which suffer from experimental biases and the low number of samples by analysing individual datasets.To date, there are mainly two strategies for the problem of interest: the first one (“data merging”) merges all datasets together and then infers a GRN whereas the other (“networks ensemble”) infers GRNs from every dataset separately and then aggregates them using some ensemble rules (such as ranksum or weightsum). Unfortunately, a thorough comparison of these two approaches is lacking.ResultsIn this work, we are going to present another meta-analysis approach for inferring GRNs from multiple studies. Our proposed meta-analysis approach, adapted to methods based on pairwise measures such as correlation or mutual information, consists of two steps: aggregating matrices of the pairwise measures from every dataset followed by extracting the network from the meta-matrix. Afterwards, we evaluate the performance of the two commonly used approaches mentioned above and our presented approach with a systematic set of experiments based on in silico benchmarks.ConclusionsWe proposed a first systematic evaluation of different strategies for reverse engineering GRNs from multiple datasets. Experiment results strongly suggest that assembling matrices of pairwise dependencies is a better strategy for network inference than the two commonly used ones.

[1]  C. Sotiriou,et al.  Meta-analysis of gene expression profiles in breast cancer: toward a unified understanding of breast cancer subtyping and prognosis signatures , 2007, Breast Cancer Research.

[2]  Giovanni Felici,et al.  CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules , 2015, Bioinform..

[3]  R. Kolde,et al.  Mining for coexpression across hundreds of datasets using novel rank aggregation and visualization methods , 2009, Genome Biology.

[4]  Kevin Kontos,et al.  Biological Network Inference Using Redundancy Analysis , 2007, BIRD.

[5]  Laurin A. J. Mueller,et al.  Integrative Network Biology: Graph Prototyping for Co-Expression Cancer Networks , 2011, PloS one.

[6]  Mogens Kruhøffer,et al.  Gene Expression in the Urinary Bladder , 2004, Cancer Research.

[7]  Expression analysis of novel biomarkers for breast cancer , 2008, Breast Cancer Research.

[8]  Philippe Salembier,et al.  Study of Normalization and Aggregation Approaches for Consensus Network Estimation , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[9]  Samik Ghosh,et al.  Harnessing Diversity towards the Reconstructing of Large Scale Gene Regulatory Networks , 2013, PLoS Comput. Biol..

[10]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[11]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[12]  K. Becker,et al.  Analysis of microarray data using Z score transformation. , 2003, The Journal of molecular diagnostics : JMD.

[13]  Crispin J. Miller,et al.  The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis , 2008, BMC Medical Genomics.

[14]  D. di Bernardo,et al.  Transcriptional gene network inference from a massive dataset elucidates transcriptome organization and gene function , 2011, Nucleic acids research.

[15]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[16]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[17]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[18]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[19]  John E. Hunter,et al.  Methods of Meta-Analysis: Correcting Error and Bias in Research Findings , 1991 .

[20]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[21]  Naftali Kaminski,et al.  MetaQC: objective quality control and inclusion/exclusion criteria for genomic meta-analysis , 2011, Nucleic acids research.

[22]  Chunyu Liu,et al.  Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods , 2011, PloS one.

[23]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[24]  Rainer Breitling,et al.  RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis , 2006, Bioinform..

[25]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[26]  Hugues Bersini,et al.  Batch effect removal methods for microarray gene expression data integration: a survey , 2013, Briefings Bioinform..

[27]  J. Cuzick,et al.  A Wilcoxon-type test for trend. , 1985, Statistics in medicine.

[28]  Philippe Salembier,et al.  NetBenchmark: a bioconductor package for reproducible benchmarks of gene regulatory network inference , 2015, BMC Bioinformatics.

[29]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[30]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[31]  Douglas G Altman,et al.  Key Issues in Conducting a Meta-Analysis of Gene Expression Microarray Datasets , 2008, PLoS medicine.

[32]  B. Haibe-Kains,et al.  Gene regulatory networks and their applications: understanding biological and medical problems in terms of networks , 2014, Front. Cell Dev. Biol..

[33]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[34]  Kai Wang,et al.  Meta-analysis of Inter-species Liver Co-expression Networks Elucidates Traits Associated with Common Human Diseases , 2009, PLoS Comput. Biol..

[35]  M. Taniyama,et al.  Estrogen receptor alpha dinucleotide repeat polymorphism in Japanese patients with autoimmune thyroid diseases , 2000, BMC Medical Genetics.

[36]  Hugues Bersini,et al.  Unlocking the potential of publicly available microarray data using inSilicoDb and inSilicoMerging R/Bioconductor packages , 2012, BMC Bioinformatics.

[37]  Guy Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[38]  Gianluca Bontempi,et al.  Biological Processes Associated with Breast Cancer Clinical Outcome Depend on the Molecular Subtypes , 2008, Clinical Cancer Research.

[39]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.