Applications of Bayesian network models in predicting types of hematological malignancies

Network analysis is the preferred approach for the detection of subtle but coordinated changes in expression of an interacting and related set of genes. We introduce a novel method based on the analyses of coexpression networks and Bayesian networks, and we use this new method to classify two types of hematological malignancies; namely, acute myeloid leukemia (AML) and myelodysplastic syndrome (MDS). Our classifier has an accuracy of 93%, a precision of 98%, and a recall of 90% on the training dataset (n = 366); which outperforms the results reported by other scholars on the same dataset. Although our training dataset consists of microarray data, our model has a remarkable performance on the RNA-Seq test dataset (n = 74, accuracy = 89%, precision = 88%, recall = 98%), which confirms that eigengenes are robust with respect to expression profiling technology. These signatures are useful in classification and correctly predicting the diagnosis. They might also provide valuable information about the underlying biology of diseases. Our network analysis approach is generalizable and can be useful for classifying other diseases based on gene expression profiles. Our previously published Pigengene package is publicly available through Bioconductor, which can be used to conveniently fit a Bayesian network to gene expression data.

[1]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[2]  Nicoletta Dessì,et al.  Learning Bayesian Classifiers from Gene-Expression MicroArray Data , 2005, WILF.

[3]  Radhakrishnan Nagarajan,et al.  Bayesian Networks in R , 2013 .

[4]  Tao Cui,et al.  Th9 cells promote antitumor immunity via IL‐9 and IL‐21 and demonstrate atypical cytokine expression in breast cancer , 2017, International immunopharmacology.

[5]  R. Myers,et al.  Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data , 2005, Nucleic acids research.

[6]  Kuo-Chu Chang,et al.  Weighing and Integrating Evidence for Stochastic Simulation in Bayesian Networks , 2013, UAI.

[7]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[8]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[9]  Xujing Wang,et al.  Quantitative utilization of prior biological knowledge in the Bayesian network modeling of gene expression data , 2011, BMC Bioinformatics.

[10]  Bonnie Berger,et al.  Reconstructing Causal Biological Networks through Active Learning , 2016, PloS one.

[11]  Ahmedin Jemal,et al.  Cancer Statistics, 2002 , 2002, CA: a cancer journal for clinicians.

[12]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[13]  Ulrich Mansmann,et al.  An 86-probe-set gene-expression signature predicts survival in cytogenetically normal acute myeloid leukemia. , 2008, Blood.

[14]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[15]  P. Campbell,et al.  Combining gene mutation with gene expression data improves outcome prediction in myelodysplastic syndromes , 2015, Nature Communications.

[16]  Gholamreza Haffari,et al.  Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis , 2013, BMC Genomics.

[17]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[18]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[19]  Luca Malcovati,et al.  Revised international prognostic scoring system for myelodysplastic syndromes. , 2012, Blood.

[20]  Jing Yu,et al.  Computational Inference of Neural Information Flow Networks , 2006, PLoS Comput. Biol..

[21]  Teresa M. Przytycka,et al.  Chapter 5: Network Biology Approach to Complex Diseases , 2012, PLoS Comput. Biol..

[22]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[23]  Robert Patro,et al.  Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms , 2013, ArXiv.

[24]  S. Horvath,et al.  A General Framework for Weighted Gene Co-Expression Network Analysis , 2005, Statistical applications in genetics and molecular biology.

[25]  Holger Fröhlich,et al.  Joint Bayesian inference of condition-specific miRNA and transcription factor activities from combined gene and microRNA expression data , 2012, Bioinform..

[26]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[27]  Weiliang Qiu,et al.  Integrated genome-wide association, coexpression network, and expression single nucleotide polymorphism analysis identifies novel pathway in allergic rhinitis , 2014, BMC Medical Genomics.

[28]  V. Anne Smith,et al.  Using Bayesian Network Inference Algorithms to Recover Molecular Genetic Regulatory Networks , 2002 .

[29]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[30]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[31]  Cengizhan Ozturk,et al.  Bayesian network prior: network analysis of biological data using external knowledge , 2013, Bioinform..

[32]  Chris Williams,et al.  RNA-SeQC: RNA-seq metrics for quality control and process optimization , 2012, Bioinform..

[33]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[34]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[35]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[36]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[37]  Lin Gao,et al.  Inferring Gene Regulatory Networks Using Conditional Regulation Pattern to Guide Candidate Genes , 2016, PloS one.

[38]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[39]  Finn Verner Jensen,et al.  Introduction to Bayesian Networks , 2008, Innovations in Bayesian Networks.

[40]  Charlotte Soneson,et al.  Batch Effect Confounding Leads to Strong Bias in Performance Estimates Obtained by Cross-Validation , 2014, PloS one.

[41]  M. Ricote,et al.  Interleukin-2 and its receptor complex (α, β and γ chains) in in situ and infiltrative human breast cancer: an immunohistochemical comparative study , 2003, Breast Cancer Research.

[42]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien [R package e1071 version 1.7-4] , 2020 .

[43]  Judea Pearl,et al.  Bayesian Networks , 1998, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[44]  Robert Tibshirani,et al.  Margin Trees for High-dimensional Classification , 2007, J. Mach. Learn. Res..

[45]  Jörg D. Becker,et al.  LegumeGRN: A Gene Regulatory Network Prediction Server for Functional and Comparative Studies , 2013, PloS one.

[46]  Sean R. Davis,et al.  GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor , 2007, Bioinform..

[47]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[48]  S. Gendler,et al.  Inhibition of adaptive immunity by IL9 can be disrupted to achieve rapid T-cell sensitization and rejection of progressive tumor challenges. , 2014, Cancer research.

[49]  Byoung-Tak Zhang,et al.  Applying Machine Learning Techniques to Analysis of Gene Expression Data: Cancer Diagnosis , 2002 .

[50]  Natalia Meani,et al.  Acute myeloid leukemia bearing cytoplasmic nucleophosmin (NPMc+ AML) shows a distinct gene expression profile characterized by up-regulation of genes involved in stem-cell maintenance. , 2005, Blood.

[51]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[52]  Paul P. Wang,et al.  Advances to Bayesian network inference for generating causal networks from observational biological data , 2004, Bioinform..

[53]  Debashis Ghosh,et al.  Network Integration of Genetically Regulated Gene Expression to Study Complex Diseases , 2015 .

[54]  S. Shurtleff,et al.  Clinical utility of microarray-based gene expression profiling in the diagnosis and subclassification of leukemia: report from the International Microarray Innovations in Leukemia Study Group. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[55]  Mark A. Ragan,et al.  RMaNI: Regulatory Module Network Inference framework , 2013, BMC Bioinformatics.

[56]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[57]  Joseph P. Romano,et al.  Large Sample Confidence Regions Based on Subsamples under Minimal Assumptions , 1994 .

[58]  Gary D Bader,et al.  NetPath: a public resource of curated signal transduction pathways , 2010, Genome Biology.

[59]  M. S. Brown,et al.  Support Vector Machine Classification of Microarray from Gene Expression Data , 1999 .

[60]  SödingJohannes Protein homology detection by HMM--HMM comparison , 2005 .

[61]  Wen Huang,et al.  MTML-msBayes: Approximate Bayesian comparative phylogeographic inference from multiple taxa and multiple loci with rate heterogeneity , 2011, BMC Bioinformatics.

[62]  Holger Fröhlich,et al.  Boosting Probabilistic Graphical Model Inference by Incorporating Prior Knowledge from Multiple Sources , 2013, PloS one.

[63]  S. Horvath,et al.  Conservation and evolution of gene coexpression networks in human and chimpanzee brains , 2006, Proceedings of the National Academy of Sciences.

[64]  N. Dubrawsky Cancer statistics , 1989, CA: a cancer journal for clinicians.

[65]  Alexander J. Hartemink,et al.  Principled computational methods for the validation discovery of genetic regulatory networks , 2001 .

[66]  Edwin Wang,et al.  Signaling network assessment of mutations and copy number variations predict breast cancer subtype-specific drug targets. , 2013, Cell reports.

[67]  Amir K. Foroushani,et al.  Large-scale gene network analysis reveals the significance of extracellular matrix pathway and homeobox genes in acute myeloid leukemia: an introduction to the Pigengene package and its applications , 2017, BMC Medical Genomics.

[68]  D. Curran‐Everett,et al.  The fickle P value generates irreproducible results , 2015, Nature Methods.

[69]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics, ProbabilityTheory Group (Formerly: E1071), TU Wien , 2015 .

[70]  A. Osareh,et al.  Classification and Diagnostic Prediction of Cancers Using Gene Microarray Data Analysis , 2009 .

[71]  L. Tran,et al.  Integrated Systems Approach Identifies Genetic Nodes and Networks in Late-Onset Alzheimer’s Disease , 2013, Cell.

[72]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[73]  Torsten Haferlach,et al.  Microarray-based classifiers and prognosis models identify subgroups with distinct clinical outcomes and high risk of AML transformation of myelodysplastic syndrome. , 2009, Blood.

[74]  Lin-Lin Wang,et al.  [Research progress on mechanism of MDS transformation into AML]. , 2011, Zhongguo shi yan xue ye xue za zhi.

[75]  Luis M. de Campos,et al.  Bayesian networks classifiers for gene-expression data , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[76]  Benjamin J. Raphael,et al.  Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. , 2013, The New England journal of medicine.

[77]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[78]  Pedro Larrañaga,et al.  Detecting reliable gene interactions by a hierarchy of Bayesian network classifiers , 2008, Comput. Methods Programs Biomed..

[79]  P. Bickel,et al.  ON THE CHOICE OF m IN THE m OUT OF n BOOTSTRAP AND CONFIDENCE BOUNDS FOR EXTREMA , 2008 .

[80]  Jun Shi,et al.  Transformation of myelodysplastic syndromes into acute myeloid leukemias. , 2004, Chinese medical journal.

[81]  Chitta Baral,et al.  Fuzzy C-means Clustering with Prior Biological Knowledge , 2022 .

[82]  Robert Veroff,et al.  A Bayesian Network Classification Methodology for Gene Expression Data , 2004, J. Comput. Biol..

[83]  Nir Friedman,et al.  Learning Module Networks , 2002, J. Mach. Learn. Res..

[84]  Habil Zare,et al.  Underexpression of Specific Interferon Genes Is Associated with Poor Prognosis of Melanoma , 2017, PloS one.

[85]  Rob Patro,et al.  Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms , 2013, Nature Biotechnology.

[86]  Christina Kendziorski,et al.  Statistical methods for gene set co-expression analysis , 2009, Bioinform..

[87]  N. Aghaeepour,et al.  Automated analysis of multidimensional flow cytometry data improves diagnostic accuracy between mantle cell lymphoma and small lymphocytic lymphoma. , 2012, American journal of clinical pathology.

[88]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[89]  Francis R. Bach,et al.  Bolasso: model consistent Lasso estimation through the bootstrap , 2008, ICML '08.

[90]  Karin Breuer,et al.  InnateDB: systems biology of innate immunity and beyond—recent updates and continuing curation , 2012, Nucleic Acids Res..

[91]  Jun Zhu,et al.  Using Simulated Data to Evaluate Bayesian Network Approach for Integrating Diverse Data , 2013 .

[92]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[93]  Fumio Nagumo,et al.  Increased proliferation of a human breast carcinoma cell line by recombinant interleukin-2 , 1994, Cancer Immunology, Immunotherapy.

[94]  H. Fröhlich,et al.  Network Based Consensus Gene Signatures for Biomarker Discovery in Breast Cancer , 2011, PloS one.

[95]  Carsten Peterson,et al.  Molecular serum portraits in patients with primary breast cancer predict the development of distant metastases , 2011, Proceedings of the National Academy of Sciences.

[96]  Zalmiyah Zakaria,et al.  A review on the computational approaches for gene regulatory network construction , 2014, Comput. Biol. Medicine.

[97]  Tao Wang,et al.  Disease gene explorer: display disease gene dependency by combining Bayesian networks with clustering , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[98]  Ross D. Shachter,et al.  Simulation Approaches to General Probabilistic Inference on Belief Networks , 2013, UAI.

[99]  Maria A Stalteri,et al.  Give me shelter: the global housing crisis. , 2003, BMC Bioinformatics.

[100]  Blaz Zupan,et al.  Data and text mining Visualization-based cancer microarray data classification analysis , 2007 .

[101]  Yin Liu,et al.  Incorporating prior knowledge into Gene Network Study , 2013, Bioinform..