Predicting gene expression in T cell differentiation from histone modifications and transcription factor binding affinities by linear mixture models

BackgroundThe differentiation process from stem cells to fully differentiated cell types is controlled by the interplay of chromatin modifications and transcription factor activity. Histone modifications or transcription factors frequently act in a multi-functional manner, with a given DNA motif or histone modification conveying both transcriptional repression and activation depending on its location in the promoter and other regulatory signals surrounding it.ResultsTo account for the possible multi functionality of regulatory signals, we model the observed gene expression patterns by a mixture of linear regression models. We apply the approach to identify the underlying histone modifications and transcription factors guiding gene expression of differentiated CD4+ T cells. The method improves the gene expression prediction in relation to the use of a single linear model, as often used by previous approaches. Moreover, it recovered the known role of the modifications H3K4me3 and H3K27me3 in activating cell specific genes and of some transcription factors related to CD4+ T differentiation.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Alexander Schliep,et al.  Inferring differentiation pathways from gene expression , 2008, ISMB.

[3]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[4]  W. DeSarbo,et al.  A maximum likelihood methodology for clusterwise linear regression , 1988 .

[5]  Jean-Paul Chilès,et al.  Wiley Series in Probability and Statistics , 2012 .

[6]  M. Belosevic,et al.  Transcriptional regulation of hemopoiesis. , 2001, Developmental and comparative immunology.

[7]  Benjamin Georgi,et al.  PyMix - The Python mixture package - a tool for clustering of heterogeneous biological data , 2010, BMC Bioinformatics.

[8]  B. Turner,et al.  Defining an epigenetic code , 2007, Nature Cell Biology.

[9]  Michael B. Eisen,et al.  Identification of regulatory elements using a feature selection method , 2002, Bioinform..

[10]  Julia A. Lasserre,et al.  Histone modification levels are predictive for gene expression , 2010, Proceedings of the National Academy of Sciences.

[11]  Yoram Groner,et al.  Runx3 and Runx1 are required for CD8 T cell development during thymopoiesis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Damian Smedley,et al.  BioMart – biological queries made easy , 2009, BMC Genomics.

[13]  Achim Leutz,et al.  Histone H3 tail positioning and acetylation by the c-Myb but not the v-Myb DNA-binding SANT domain. , 2005, Genes & development.

[14]  Ivan G. Costa,et al.  Gene expression trees in lymphoid development , 2007, BMC Immunology.

[15]  C. Allis,et al.  Epigenetics: A Landscape Takes Shape , 2007, Cell.

[16]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[17]  T. Kouzarides Chromatin Modifications and Their Function , 2007, Cell.

[18]  W. Paul,et al.  CD4 T cells: fates, functions, and faults. , 2008, Blood.

[19]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[20]  R. Myers,et al.  An Integrated Software System for Analyzing Chip-chip and Chip-seq Data (supplementary Information) , 2008 .

[21]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[22]  Michael O Dorschner,et al.  Comprehensive epigenetic profiling identifies multiple distal regulatory elements directing transcription of the gene encoding interferon-γ , 2007, Nature Immunology.

[23]  Geoffrey E. Hinton,et al.  Recognizing Handwritten Digits Using Mixtures of Linear Models , 1994, NIPS.

[24]  A. Rolink,et al.  Transcriptional networks in developing and mature B cells , 2005, Nature Reviews Immunology.

[25]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[26]  Barrett C. Foat,et al.  Predictive modeling of genome-wide mRNA expression: from modules to molecules. , 2007, Annual review of biophysics and biomolecular structure.

[27]  Yuka Kanno,et al.  Global mapping of H3K4me3 and H3K27me3 reveals specificity and plasticity in lineage fate determination of differentiating CD4+ T cells. , 2009, Immunity.

[28]  Martin Vingron,et al.  Predicting transcription factor affinities to DNA from a biophysical model , 2007, Bioinform..

[29]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[30]  Gavin L. Fox,et al.  Cautionary Remarks on the Use of Clusterwise Regression , 2008, Multivariate behavioral research.

[31]  Ellen V. Rothenberg,et al.  Launching the T-cell-lineage developmental programme , 2008, Nature Reviews Immunology.

[32]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[33]  Bing Ren,et al.  Unraveling epigenetic regulation in embryonic stem cells. , 2008, Cell stem cell.

[34]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[35]  Martin Vingron,et al.  CpG-depleted promoters harbor tissue-specific transcription factor binding signals—implications for motif overrepresentation analyses , 2009, Nucleic acids research.