minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information

ResultsThis paper presents the R/Bioconductor package minet (version 1.1.6) which provides a set of functions to infer mutual information networks from a dataset. Once fed with a microarray dataset, the package returns a network where nodes denote genes, edges model statistical dependencies between genes and the weight of an edge quantifies the statistical evidence of a specific (e.g transcriptional) gene-to-gene interaction. Four different entropy estimators are made available in the package minet (empirical, Miller-Madow, Schurmann-Grassberger and shrink) as well as four different inference methods, namely relevance networks, ARACNE, CLR and MRNET. Also, the package integrates accuracy assessment tools, like F-scores, PR-curves and ROC-curves in order to compare the inferred network with a reference one.ConclusionThe package minet provides a series of tools for inferring transcriptional networks from microarray data. It is freely available from the Comprehensive R Archive Network (CRAN) as well as from the Bioconductor website.

[1]  Gianluca Bontempi,et al.  On the impact of entropy estimator in transcriptional regulatory network inference , 2008 .

[2]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[3]  Kevin Kontos,et al.  Information-Theoretic Inference of Large Transcriptional Regulatory Networks , 2007, EURASIP J. Bioinform. Syst. Biol..

[4]  L. Györfi,et al.  Nonparametric entropy estimation. An overview , 1997 .

[5]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[6]  M K Markey,et al.  Application of the mutual information criterion for feature selection in computer-aided diagnosis. , 2001, Medical physics.

[7]  Stan Szpakowicz,et al.  Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation , 2006, Australian Conference on Artificial Intelligence.

[8]  William Bialek,et al.  Entropy and information in neural spike trains: progress on the sampling problem. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[10]  Brian D. Ripley,et al.  Modern Applied Statistics with S Fourth edition , 2002 .

[11]  Adam A. Margolin,et al.  Reverse engineering of regulatory networks in human B cells , 2005, Nature Genetics.

[12]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[13]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[14]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[15]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[16]  Carsten O. Daub,et al.  Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data , 2004, BMC Bioinformatics.

[17]  Thomas Lengauer,et al.  Diversity and complexity of HIV-1 drug resistance: A bioinformatics approach to predicting phenotype from genotype , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[19]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[20]  Peter Grassberger,et al.  Entropy estimation of symbol sequences. , 1996, Chaos.

[21]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[22]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[23]  Igor Vajda,et al.  Estimation of the Information by an Adaptive Partitioning of the Observation Space , 1999, IEEE Trans. Inf. Theory.

[24]  Kathleen Marchal,et al.  SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms , 2006, BMC Bioinformatics.

[25]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[26]  Liang Wu,et al.  Classifying n-back EEG data using entropy and mutual information features , 2007, ESANN.

[27]  A. Butte,et al.  Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[29]  Geoffrey I. Webb,et al.  On Why Discretization Works for Naive-Bayes Classifiers , 2003, Australian Conference on Artificial Intelligence.

[30]  PaninskiLiam Estimation of entropy and mutual information , 2003 .

[31]  David A. Bell,et al.  Learning Bayesian networks from data: An information-theory based approach , 2002, Artif. Intell..

[32]  Timothy S Gardner,et al.  Reverse-engineering transcription control networks. , 2005, Physics of life reviews.

[33]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[34]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[35]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[36]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[38]  Bernd Freisleben,et al.  Greedy and Local Search Heuristics for Unconstrained Binary Quadratic Programming , 2002, J. Heuristics.

[39]  Robert Gentleman,et al.  Network structures and algorithms in Bioconductor , 2005, Bioinform..

[40]  M. Reinders,et al.  Genetic network modeling. , 2002, Pharmacogenomics.

[41]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[42]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[43]  Mark Craven,et al.  Markov Networks for Detecting Overalpping Elements in Sequence Data , 2004, NIPS.