An Integrative Scoring Approach to Identify Transcriptional Regulations Controlling Lung Surfactant Homeostasis

Transcriptional regulatory network identification is both a fundamental challenge in systems biology and an important practical application of data mining and machine learning. In this study, we propose a semi-supervised learning-based integrative scoring approach to tackle this challenge and predict transcriptional regulations. Our approach out-performs a state-of-the-art label propagation method and reaches AUC scores above 0.96 for three datasets from microarray experiments in the validation. A map of the transcriptional regulatory network controlling lung surfactant homeostasis was constructed. The predicted and prioritized transcriptional regulations were further validated through experimental verifications. Many other predicted novel regulations may serve as candidates for future experimental investigations.

[1]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[2]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[3]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[4]  Marcel J. T. Reinders,et al.  Linear Modeling of Genetic Networks from Experimental Data , 2000, ISMB.

[5]  Sergei Egorov,et al.  MedScan, a natural language processing engine for MEDLINE abstracts , 2003, Bioinform..

[6]  Gil Alterovitz,et al.  Knowledge-Based Bioinformatics: From analysis to interpretation , 2010 .

[7]  Satoru Miyano,et al.  Dynamic Bayesian Network and Nonparametric Regression for Nonlinear Modeling of Gene Networks from Time Series Gene Expression Data , 2003, CMSB.

[8]  P. Bourgine,et al.  Topological and causal structure of the yeast transcriptional regulatory network , 2002, Nature Genetics.

[9]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[10]  Satoru Miyano,et al.  Inferring Gene Regulatory Networks from Time-Ordered Gene Expression Data Using Differential Equations , 2002, Discovery Science.

[11]  Chaoyang Zhang,et al.  Comparison of probabilistic Boolean network and dynamic Bayesian network approaches for inferring gene regulatory networks , 2007, BMC Bioinformatics.

[12]  Xiao Zhang,et al.  Molecular Network Analysis and Applications , 2010 .

[13]  Atul J. Butte,et al.  Quantifying the relationship between co-expression, co-regulation and gene function , 2004, BMC Bioinformatics.

[14]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[15]  Minlu Zhang,et al.  A systems approach to mapping transcriptional networks controlling surfactant homeostasis , 2010, BMC Genomics.

[16]  Mathieu Blanchette,et al.  PReMod: a database of genome-wide mammalian cis-regulatory module predictions , 2006, Nucleic Acids Res..

[17]  TaeHyun Hwang,et al.  A Heterogeneous Label Propagation Algorithm for Disease Gene Discovery , 2010, SDM.

[18]  Roger E Bumgarner,et al.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. , 2001, Science.

[19]  Satoru Miyano,et al.  Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[20]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[21]  J. Johansson,et al.  Molecular structures and interactions of pulmonary surfactant components. , 1997, European journal of biochemistry.

[22]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[23]  Jason Weston,et al.  Semi-supervised Protein Classification Using Cluster Kernels , 2003, NIPS.

[24]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[25]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[26]  Limin Fu,et al.  FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data , 2007, BMC Bioinformatics.

[27]  M. Reinders,et al.  Genetic network modeling. , 2002, Pharmacogenomics.

[28]  J. Whitsett,et al.  Deletion of Scap in Alveolar Type II Cells Influences Lung Lipid Homeostasis and Identifies a Compensatory Role for Pulmonary Lipofibroblasts* , 2009, Journal of Biological Chemistry.