Supervised learning methods in modeling of CD4+ T cell heterogeneity

BackgroundModeling of the immune system – a highly non-linear and complex system – requires practical and efficient data analytic approaches. The immune system is composed of heterogeneous cell populations and hundreds of cell types, such as neutrophils, eosinophils, macrophages, dendritic cells, T cells, and B cells. Each cell type is highly diverse and can be further differentiated into subsets with unique and overlapping functions. For example, CD4+ T cells can be differentiated into Th1, Th2, Th17, Th9, Th22, Treg, Tfh, as well as Tr1. Each subset plays different roles in the immune system. To study molecular mechanisms of cell differentiation, computational systems biology approaches can be used to represent these processes; however, the latter often requires building complex intracellular signaling models with a large number of equations to accurately represent intracellular pathways and biochemical reactions. Furthermore, studying the immune system entails integration of complex processes which occur at different time and space scales.MethodsThis study presents and compares four supervised learning methods for modeling CD4+ T cell differentiation: Artificial Neural Networks (ANN), Random Forest (RF), Support Vector Machines (SVM), and Linear Regression (LR). Application of supervised learning methods could reduce the complexity of Ordinary Differential Equations (ODEs)-based intracellular models by only focusing on the input and output cytokine concentrations. In addition, this modeling framework can be efficiently integrated into multiscale models.ResultsOur results demonstrate that ANN and RF outperform the other two methods. Furthermore, ANN and RF have comparable performance when applied to in silico data with and without added noise. The trained models were also able to reproduce dynamic behavior when applied to experimental data; in four out of five cases, model predictions based on ANN and RF correctly predicted the outcome of the system. Finally, the running time of different methods was compared, which confirms that ANN is considerably faster than RF.ConclusionsUsing machine learning as opposed to ODE-based method reduces the computational complexity of the system and allows one to gain a deeper understanding of the complex interplay between the different related entities.

[1]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[2]  T. Devoogd Science to Bridge the Americas , 2010, Science.

[3]  Hua Tang,et al.  A comparison of the performances of an artificial neural network and a regression model for GFR estimation. , 2013, American journal of kidney diseases : the official journal of the National Kidney Foundation.

[4]  A Goldbeter,et al.  A minimal cascade model for the mitotic oscillator involving cyclin and cdc2 kinase. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[5]  H. Weiner,et al.  Reciprocal developmental pathways for the generation of pathogenic effector TH17 and regulatory T cells , 2006, Nature.

[6]  J. Bassaganya-Riera,et al.  Computational modeling of heterogeneity and function of CD4+ T cells , 2014, Front. Cell Dev. Biol..

[7]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics (e1071), TU Wien , 2014 .

[8]  Yizeng Liang,et al.  Exploring metabolic syndrome serum profiling based on gas chromatography mass spectrometry and random forest models. , 2014, Analytica chimica acta.

[9]  Hervé Groux,et al.  A CD4+T-cell subset inhibits antigen-specific T-cell responses and prevents colitis , 1997, Nature.

[10]  David L. Olson,et al.  Advanced Data Mining Techniques , 2008 .

[11]  W. Paul,et al.  Peripheral CD4+ T‐cell differentiation regulated by networks of cytokines and transcription factors , 2010, Immunological reviews.

[12]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[13]  Giancarlo Mauri,et al.  A comparison of machine learning techniques for survival prediction in breast cancer , 2011, BioData Mining.

[14]  R. D. Hatton,et al.  Transforming growth factor-β induces development of the TH17 lineage , 2006, Nature.

[15]  Mudita Singhal,et al.  COPASI - a COmplex PAthway SImulator , 2006, Bioinform..

[16]  S. Mangan,et al.  Structure and function of the feed-forward loop network motif , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  E. Schadt,et al.  Unifying immunology with informatics and multiscale biology , 2014, Nature Immunology.

[18]  Xiaoying Zhang,et al.  Neural network models for classifying immune cell subsets , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[19]  M. Segal,et al.  Relating HIV-1 Sequence Variation to Replication Capacity via Trees and Forests , 2004, Statistical applications in genetics and molecular biology.

[20]  T. Mcclanahan,et al.  TGF-β and IL-6 drive the production of IL-17 and IL-10 by T cells and restrain TH-17 cell–mediated pathology , 2007, Nature Immunology.

[21]  E. Eisenstein,et al.  The Treg/Th17 Cell Balance: A New Paradigm for Autoimmunity , 2009, Pediatric Research.

[22]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[23]  Yang Li,et al.  The secret of FOXP3 downregulation in the inflammation condition. , 2012, International journal of clinical and experimental pathology.

[24]  M Kurimoto,et al.  IFN-gamma-inducing factor (IGIF) is a costimulatory factor on the activation of Th1 but not Th2 cells and exerts its effect independently of IL-12. , 1997, Journal of immunology.

[25]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[26]  A. Regev,et al.  Dynamic regulatory network controlling Th17 cell differentiation , 2013, Nature.

[27]  Hongjun Lu,et al.  Effective Data Mining Using Neural Networks , 1996, IEEE Trans. Knowl. Data Eng..

[28]  Madhav V. Marathe,et al.  Systems Modeling of Molecular Mechanisms Controlling Cytokine-driven CD4+ T Cell Differentiation and Phenotype Plasticity , 2013, PLoS Comput. Biol..

[29]  Madhav V. Marathe,et al.  ENISI Visual, an agent-based simulator for modeling gut immunity , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[30]  Stefano Cabras,et al.  A strategy analysis for genetic association studies with known inbreeding , 2011, BMC Genetics.

[31]  Yukiko Matsuoka,et al.  Using process diagrams for the graphical representation of biological networks , 2005, Nature Biotechnology.

[32]  David A. Freedman,et al.  Statistical Models: Theory and Practice: References , 2005 .

[33]  Xingming Zhao,et al.  Computational Systems Biology , 2013, TheScientificWorldJournal.

[34]  A. Roli Artificial Neural Networks , 2012, Lecture Notes in Computer Science.

[35]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[36]  D. Noble Modeling the Heart--from Genes to Cells to the Whole Organ , 2002, Science.

[37]  Gaurang Panchal,et al.  DETERMINATION OF OVER-LEARNING AND OVER-FITTING PROBLEM IN BACK PROPAGATION NEURAL NETWORK , 2011 .

[38]  J. Dayhoff,et al.  Artificial neural networks , 2001, Cancer.

[39]  T. Mosmann,et al.  The expanding universe of T-cell subsets: Th1, Th2 and more. , 1996, Immunology today.

[40]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.

[41]  Sovan Lek,et al.  Artificial neural networks as a tool in ecological modelling, an introduction , 1999 .

[42]  Toshifumi Hibi,et al.  T-bet upregulation and subsequent interleukin 12 stimulation are essential for induction of Th1 mediated immunopathology in Crohn’s disease , 2004, Gut.

[43]  T.,et al.  Training Feedforward Networks with the Marquardt Algorithm , 2004 .

[44]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[45]  Raquel Hontecillas,et al.  Systems Modeling of the Role of Interleukin-21 in the Maintenance of Effector CD4+ T Cell Responses during Chronic Helicobacter pylori Infection , 2014, mBio.

[46]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[47]  Riitta Lahesmaa,et al.  Integrative genomics and transcriptomics analysis of human embryonic and induced pluripotent stem cells , 2014, BioData Mining.

[48]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[49]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[50]  Pamela K. Kreeger,et al.  Cancer systems biology: a network modeling perspective , 2009, Carcinogenesis.

[51]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[52]  Jude W. Shavlik,et al.  Using neural networks for data mining , 1997, Future Gener. Comput. Syst..

[53]  Dan ie l T. Gil lespie A rigorous derivation of the chemical master equation , 1992 .

[54]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[55]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[56]  E Kuhl,et al.  Computational modeling of chemo-bio-mechanical coupling: a systems-biology approach toward wound healing , 2016, Computer methods in biomechanics and biomedical engineering.

[57]  Stephen T. C. Wong,et al.  Integration of multiscale dendritic spine structure and function data into systems biology models , 2014, Front. Neuroanat..

[58]  Fred S. Kantor,et al.  Fundamentals of Immunology , 1948, The Yale Journal of Biology and Medicine.

[59]  H. Chae,et al.  Characterization of diverse natural variants of CYP102A1 found within a species of Bacillus megaterium , 2011, AMB Express.

[60]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[61]  H. Seal Studies in the history of probability and statistics. XV. The historical velopment of the Gauss linear model. , 1967, Biometrika.

[62]  Anil Nerode,et al.  Hybrid Knowledge Bases , 1996, IEEE Trans. Knowl. Data Eng..

[63]  Amit Awasthi,et al.  IL-6 controls Th17 immunity in vivo by inhibiting the conversion of conventional T cells into Foxp3+ regulatory T cells , 2008, Proceedings of the National Academy of Sciences.

[64]  T. Wynn,et al.  Protective and pathogenic functions of macrophage subsets , 2011, Nature Reviews Immunology.

[65]  Matthew Scotch,et al.  Comparison of ARIMA and Random Forest time series models for prediction of avian influenza H5N1 outbreaks , 2014, BMC Bioinformatics.

[66]  Dimitrios I. Fotiadis,et al.  Artificial neural networks for solving ordinary and partial differential equations , 1997, IEEE Trans. Neural Networks.

[67]  Anne-Laure Boulesteix,et al.  Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics , 2012, WIREs Data Mining Knowl. Discov..

[68]  W. Catalona,et al.  Artificial neural networks in the diagnosis and prognosis of prostate cancer: a pilot study. , 1994, The Journal of urology.

[69]  Miguel Rocha,et al.  Modeling formalisms in Systems Biology , 2011, AMB Express.

[70]  Stefan Fritsch,et al.  neuralnet: Training of Neural Networks , 2010, R J..

[71]  B. Yegnanarayana,et al.  Artificial Neural Networks , 2004 .

[72]  Yongguo Mei,et al.  ENISI MSM: A novel multi-scale modeling platform for computational immunology , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[73]  J. Srinivas,et al.  Neural Networks: Algorithms and Applications , 2002 .

[74]  Vineet Sahula,et al.  Power Aware Hardware Prototyping of Multiclass SVM Classifier Through Reconfiguration , 2012, 2012 25th International Conference on VLSI Design.

[75]  Hong Ling,et al.  Novel recurrent neural network for modelling biological networks: Oscillatory p53 interaction dynamics , 2013, Biosyst..

[76]  S. Akira,et al.  Pathogen Recognition and Innate Immunity , 2006, Cell.