NATbox: a network analysis toolbox in R

BackgroundThere has been recent interest in capturing the functional relationships (FRs) from high-throughput assays using suitable computational techniques. FRs elucidate the working of genes in concert as a system as opposed to independent entities hence may provide preliminary insights into biological pathways and signalling mechanisms. Bayesian structure learning (BSL) techniques and its extensions have been used successfully for modelling FRs from expression profiles. Such techniques are especially useful in discovering undocumented FRs, investigating non-canonical signalling mechanisms and cross-talk between pathways. The objective of the present study is to develop a graphical user interface (GUI), NATbox: Network Analysis Toolbox in the language R that houses a battery of BSL algorithms in conjunction with suitable statistical tools for modelling FRs in the form of acyclic networks from gene expression profiles and their subsequent analysis.ResultsNATbox is a menu-driven open-source GUI implemented in the R statistical language for modelling and analysis of FRs from gene expression profiles. It provides options to (i) impute missing observations in the given data (ii) model FRs and network structure from gene expression profiles using a battery of BSL algorithms and identify robust dependencies using a bootstrap procedure, (iii) present the FRs in the form of acyclic graphs for visualization and investigate its topological properties using network analysis metrics, (iv) retrieve FRs of interest from published literature. Subsequently, use these FRs as structural priors in BSL (v) enhance scalability of BSL across high-dimensional data by parallelizing the bootstrap routines.ConclusionNATbox provides a menu-driven GUI for modelling and analysis of FRs from gene expression profiles. By incorporating readily available functions from existing R-packages, it minimizes redundancy and improves reproducibility, transparency and sustainability, characteristic of open-source environments. NATbox is especially suited for interdisciplinary researchers and biologists with minimal programming experience and would like to use systems biology approaches without delving into the algorithmic aspects. The GUI provides appropriate parameter recommendations for the various menu options including default parameter choices for the user. NATbox can also prove to be a useful demonstration and teaching tool in graduate and undergraduate course in systems biology. It has been tested successfully under Windows and Linux operating systems. The source code along with installation instructions and accompanying tutorial can be found at http://bioinformatics.ualr.edu/natboxWiki/index.php/Main_Page.

[1]  Neal Madras,et al.  Modeling Stem Cell Development by Retrospective Analysis of Gene Expression Profiles in Single Progenitor‐Derived Colonies , 2002, Stem cells.

[2]  Trevor Hastie,et al.  Imputing Missing Data for Gene Expression Arrays , 2001 .

[3]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[4]  Xue Li,et al.  Modulation of morphogenesis by noncanonical Wnt signaling requires ATF/CREB family–mediated transcriptional activation of TGFβ2 , 2007, Nature Genetics.

[5]  D. Margaritis Learning Bayesian Network Model Structure from Data , 2003 .

[6]  Nir Friedman,et al.  Data Analysis with Bayesian Networks: A Bootstrap Approach , 1999, UAI.

[7]  F. Tobin,et al.  PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL FLORIDA ARTIFICIAL INTELLIGENCE RESEARCH SOCIETY CONFERENCE , 2003 .

[8]  Constantin F. Aliferis,et al.  Algorithms for Large Scale Markov Blanket Discovery , 2003, FLAIRS.

[9]  Xiaohui Chen,et al.  BNArray: an R package for constructing gene regulatory networks from microarray data by using Bayesian network. , 2006, Bioinformatics.

[10]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[11]  Dimitris Margaritis,et al.  Speculative Markov blanket discovery for optimal feature selection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[12]  Michel L. Goldstein,et al.  Problems with fitting to the power-law distribution , 2004, cond-mat/0402322.

[13]  Tom A. B. Snijders,et al.  Social Network Analysis , 2011, International Encyclopedia of Statistical Science.

[14]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[15]  J. Aubin,et al.  Modeling genetic networks from clonal analysis. , 2004, Journal of theoretical biology.

[16]  M. Omizo,et al.  Modeling , 1983, Encyclopedic Dictionary of Archaeology.

[17]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[18]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[19]  Claus Dethlefsen,et al.  deal: A Package for Learning Bayesian Networks , 2003 .

[20]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[21]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[22]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.

[24]  John Scott What is social network analysis , 2010 .

[25]  A. Arkin,et al.  It's a noisy business! Genetic regulation at the nanomolar scale. , 1999, Trends in genetics : TIG.

[26]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[27]  Chris Mungall,et al.  AmiGO: online access to ontology and annotation data , 2008, Bioinform..