Detecting reliable gene interactions by a hierarchy of Bayesian network classifiers

The main purpose of a gene interaction network is to map the relationships of the genes that are out of sight when a genomic study is tackled. DNA microarrays allow the measure of gene expression of thousands of genes at the same time. These data constitute the numeric seed for the induction of the gene networks. In this paper, we propose a new approach to build gene networks by means of Bayesian classifiers, variable selection and bootstrap resampling. The interactions induced by the Bayesian classifiers are based both on the expression levels and on the phenotype information of the supervised variable. Feature selection and bootstrap resampling add reliability and robustness to the overall process removing the false positive findings. The consensus among all the induced models produces a hierarchy of dependences and, thus, of variables. Biologists can define the depth level of the model hierarchy so the set of interactions and genes involved can vary from a sparse to a dense set. Experimental results show how these networks perform well on classification tasks. The biological validation matches previous biological findings and opens new hypothesis for future studies.

[1]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[2]  Isaac Engel,et al.  Gene expression patterns define novel roles for E47 in cell cycle progression, cytokine-mediated signaling, and T lineage development. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[3]  José Miguel García,et al.  E-cadherin and vitamin D receptor regulation by SNAIL and ZEB1 in colon cancer: clinicopathological correlations. , 2005, Human molecular genetics.

[4]  S. Kato,et al.  Retracted: Transrepression by a liganded nuclear receptor via a bHLH activator through co‐regulator switching , 2004, The EMBO journal.

[5]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[6]  Emmanuel Barillot,et al.  Classification of microarray data using gene networks , 2007, BMC Bioinformatics.

[7]  Martin A. Nowak,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004 .

[8]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[9]  Wentian Li,et al.  How Many Genes are Needed for a Discriminant Microarray Data Analysis , 2001, physics/0104029.

[10]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[11]  Edward R. Dougherty,et al.  Steady-State Analysis of Genetic Regulatory Networks Modelled by Probabilistic Boolean Networks , 2003, Comparative and functional genomics.

[12]  Nir Friedman,et al.  Inferring subnetworks from perturbed expression profiles , 2001, ISMB.

[13]  K. Kinzler,et al.  A model for p53-induced apoptosis , 1997, Nature.

[14]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[15]  Randy Kerber,et al.  ChiMerge: Discretization of Numeric Attributes , 1992, AAAI.

[16]  T Tanaka,et al.  Up-regulation of the ectodermal-neural cortex 1 (ENC1) gene, a downstream target of the beta-catenin/T-cell factor complex, in colorectal carcinomas. , 2001, Cancer research.

[17]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[18]  John Quackenbush,et al.  Microarray gene expression data analysis - a beginner's guide , 2003 .

[19]  Marc W. Kirschner,et al.  Physiological regulation of β-catenin stability by Tcf3 and CK1ε , 2001, The Journal of cell biology.

[20]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[21]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[22]  L. A. Smith,et al.  Feature Subset Selection: A Correlation Based Filter Approach , 1997, ICONIP.

[23]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[24]  Sarah Glover,et al.  Neuromedin B and its receptor are mitogens in both normal and malignant epithelial cells lining the colon. , 2005, American journal of physiology. Gastrointestinal and liver physiology.

[25]  Susumu Nakashima,et al.  Frequent downregulation of the runt domain transcription factors RUNX1, RUNX3 and their cofactor CBFB in gastric cancer , 2005, International journal of cancer.

[26]  Jianzhong Li,et al.  A stable gene selection in microarray data analysis , 2006, BMC Bioinformatics.

[27]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[28]  Junbai Wang,et al.  New probabilistic graphical models for genetic regulatory networks studies , 2005, J. Biomed. Informatics.

[29]  Keitaro Yamada,et al.  Identification of Alu-mediated, large deletion-spanning exons 2-4 in a patient with mitochondrial acetoacetyl-CoA thiolase deficiency. , 2006, Molecular genetics and metabolism.

[30]  Moshe Ben-Bassat,et al.  35 Use of distance measures, information measures and error bounds in feature evaluation , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[31]  Masataka Nakamura,et al.  Snail regulates p21(WAF/CIP1) expression in cooperation with E2A and Twist. , 2004, Biochemical and biophysical research communications.

[32]  Lloyd A. Smith,et al.  Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper , 1999, FLAIRS.

[33]  Ru-Sheng Liu,et al.  Pattern classification in DNA microarray data of multiple tumor types , 2006, Pattern Recognit..

[34]  Lesley Jones,et al.  Microarray Gene Expression Data Analysis: A Beginners Guide , 2004, Human Genetics.

[35]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[36]  Pedro Larrañaga,et al.  Feature selection in Bayesian classifiers for the prognosis of survival of cirrhotic patients treated with TIPS , 2005, J. Biomed. Informatics.

[37]  Ali Ahmad,et al.  Monocyte Differentiation Up-regulates the Expression of the Lysosomal Sialidase, Neu1, and Triggers Its Targeting to the Plasma Membrane via Major Histocompatibility Complex Class II-positive Compartments* , 2006, Journal of Biological Chemistry.

[38]  Xiaobo Zhou,et al.  A Bayesian connectivity-based approach to constructing probabilistic gene regulatory networks , 2004, Bioinform..

[39]  Stuart G. Baker,et al.  Identifying genes that contribute most to good classification in microarrays , 2006, BMC Bioinformatics.

[40]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[41]  Claude Bouchard,et al.  Neuromedin beta: a strong candidate gene linking eating behaviors and susceptibility to obesity. , 2004, American Journal of Clinical Nutrition.

[42]  Naomi Kondo,et al.  Single base substitutions at the initiator codon in the mitochondrial acetoacetyl‐CoA thiolase (ACAT1/T2) gene result in production of varying amounts of wild‐type T2 polypeptide , 2003, Human mutation.

[43]  A. Pshezhetsky,et al.  Differential expression of endogenous sialidases of human monocytes during cellular differentiation into macrophages , 2005, The FEBS journal.

[44]  A D Auerbach,et al.  Identification of Alu‐mediated deletions in the Fanconi anemia gene FAA , 1998, Human mutation.

[45]  Pedro Larrañaga,et al.  Machine Learning : Editorial , 2005 .

[46]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[47]  Stefan Michiels,et al.  Prediction of cancer outcome with microarrays: a multiple random validation strategy , 2005, The Lancet.

[48]  Xiaohui Liu,et al.  Consensus clustering and functional interpretation of gene-expression data , 2004, Genome Biology.

[49]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[50]  Igor V. Tetko,et al.  Gene selection from microarray data for cancer classification - a machine learning approach , 2005, Comput. Biol. Chem..

[51]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[52]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[53]  J. Simon Resampling: The new statistics , 1995 .

[54]  Eibe Frank,et al.  Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms , 2004, PAKDD.

[55]  Richard A. Currie,et al.  E2A-PBX1 Interacts Directly with the KIX Domain of CBP/p300 in the Induction of Proliferation in Primary Hematopoietic Cells* , 2004, Journal of Biological Chemistry.

[56]  Yuan Zhuang,et al.  Differential Functions for the Transcription Factor E2A in Positive and Negative Gene Regulation in Pre-B Lymphocytes* , 2004, Journal of Biological Chemistry.

[57]  Wei Xie,et al.  Accurate Cancer Classification Using Expressions of Very Few Genes , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[58]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[59]  N. Dave,et al.  Expression of Snail protein in tumor–stroma interface , 2006, Oncogene.

[60]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[61]  Tommi S. Jaakkola,et al.  Using Graphical Models and Genomic Expression Data to Statistically Validate Models of Genetic Regulatory Networks , 2000, Pacific Symposium on Biocomputing.

[62]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[63]  Jesper Tegnér,et al.  Growing Bayesian network models of gene networks from seed genes , 2005, ECCB/JBI.

[64]  G. Bontempi,et al.  A Blocking Strategy to Improve Gene Selection for Classification of Gene Expression Data , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[65]  Amos Tanay,et al.  MinReg: A Scalable Algorithm for Learning Parsimonious Regulatory Networks in Yeast and Mammals , 2006, J. Mach. Learn. Res..

[66]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[67]  Nir Friedman,et al.  Data Analysis with Bayesian Networks: A Bootstrap Approach , 1999, UAI.