Prediction and validation of protein–protein interactors from genome-wide DNA-binding data using a knowledge-based machine-learning approach

The ability to accurately predict the DNA targets and interacting cofactors of transcriptional regulators from genome-wide data can significantly advance our understanding of gene regulatory networks. NKX2-5 is a homeodomain transcription factor that sits high in the cardiac gene regulatory network and is essential for normal heart development. We previously identified genomic targets for NKX2-5 in mouse HL-1 atrial cardiomyocytes using DNA-adenine methyltransferase identification (DamID). Here, we apply machine learning algorithms and propose a knowledge-based feature selection method for predicting NKX2-5 protein : protein interactions based on motif grammar in genome-wide DNA-binding data. We assessed model performance using leave-one-out cross-validation and a completely independent DamID experiment performed with replicates. In addition to identifying previously described NKX2-5-interacting proteins, including GATA, HAND and TBX family members, a number of novel interactors were identified, with direct protein : protein interactions between NKX2-5 and retinoid X receptor (RXR), paired-related homeobox (PRRX) and Ikaros zinc fingers (IKZF) validated using the yeast two-hybrid assay. We also found that the interaction of RXRα with NKX2-5 mutations found in congenital heart disease (Q187H, R189G and R190H) was altered. These findings highlight an intuitive approach to accessing protein–protein interaction information of transcription factors in DNA-binding experiments.

[1]  Ashley J. Waardenberg,et al.  NKX2-5 mutations causative for congenital heart disease retain functionality and are directed to hundreds of targets , 2015, eLife.

[2]  Alexander R. Pico,et al.  Dynamic and Coordinated Epigenetic Regulation of Developmental Transitions in the Cardiac Lineage , 2012, Cell.

[3]  Kazuko Koshiba-Takeuchi,et al.  Tbx5-dependent rheostatic control of cardiac gene expression and morphogenesis. , 2006, Developmental biology.

[4]  Johnathon R. Walls,et al.  Foxh1 is essential for development of the anterior heart field. , 2004, Developmental cell.

[5]  R. Hammer,et al.  Nkx2–5 transactivates the Ets-related protein 71 gene and specifies an endothelial/endocardial fate in the developing embryo , 2009, Proceedings of the National Academy of Sciences.

[6]  John Quackenbush,et al.  Variance of Gene Expression Identifies Altered Network Constraints in Neurological Disease , 2011, PLoS genetics.

[7]  Ashley J. Waardenberg,et al.  Genetic networks governing heart development. , 2014, Cold Spring Harbor perspectives in medicine.

[8]  Timothy L. Bailey,et al.  Gene expression Advance Access publication May 4, 2011 DREME: motif discovery in transcription factor ChIP-seq data , 2011 .

[9]  S. Zaffran,et al.  Retinoids and Cardiac Development , 2014 .

[10]  Richard P. Harvey,et al.  Heart Development and Regeneration , 2010 .

[11]  Youngsook Lee,et al.  The Cardiac Tissue-Restricted Homeobox Protein Csx/Nkx2.5 Physically Associates with the Zinc Finger Protein GATA4 and Cooperatively Activates Atrial Natriuretic Factor Gene Expression , 1998, Molecular and Cellular Biology.

[12]  D. Yelon,et al.  Vessel and blood specification override cardiac potential in anterior mesoderm. , 2007, Developmental cell.

[13]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[14]  Zhaolei Zhang,et al.  Alternative signaling pathways: When, where and why? , 2005, FEBS letters.

[15]  D. Benson,et al.  Biochemical analyses of eight NKX2.5 homeodomain missense mutations causing atrioventricular block and cardiac anomalies. , 2004, Cardiovascular research.

[16]  J. Bakkers,et al.  Early Endocardial Morphogenesis Requires Scl/Tal1 , 2007, PLoS genetics.

[17]  J. Perkel,et al.  The yeast two-hybrid assay , 2005 .

[18]  A. Firulli,et al.  The basic-helix-loop-helix transcription factor HAND2 directly regulates transcription of the atrial naturetic peptide gene. , 2002, Journal of molecular and cellular cardiology.

[19]  R J Schwartz,et al.  Identification of Novel DNA Binding Targets and Regulatory Domains of a Murine Tinman Homeodomain Factor, nkx-2.5(*) , 1995, The Journal of Biological Chemistry.

[20]  J. Seidman,et al.  Congenital heart disease caused by mutations in the transcription factor NKX2-5. , 1998, Science.

[21]  Ashley J. Waardenberg,et al.  CompGO: an R package for comparing and visualizing Gene Ontology enrichment differences between DNA binding experiments , 2015, BMC Bioinformatics.

[22]  E. Furlong,et al.  Transcription factors: from enhancer binding to developmental control , 2012, Nature Reviews Genetics.

[23]  Martha L Bulyk,et al.  Non-DNA-binding cofactors enhance DNA-binding specificity of a transcriptional regulatory complex , 2011, Molecular systems biology.

[24]  Martha L. Bulyk,et al.  Machine learning classification of cell-specific cardiac enhancers uncovers developmental subnetworks regulating progenitor cell division and cell fate specification , 2014, Development.

[25]  L. Hoover,et al.  The Expanding Role for Retinoid Signaling in Heart Development , 2008, TheScientificWorldJournal.

[26]  John M Westlund,et al.  Genome-wide discovery of human heart enhancers. , 2010, Genome research.

[27]  S. Kudoh,et al.  A novel LIM protein Cal promotes cardiac differentiation by association with CSX/NKX2-5 , 2004, The Journal of cell biology.

[28]  M. Kyba,et al.  Nkx2-5 Represses Gata1 Gene Expression and Modulates the Cellular Fate of Cardiac Progenitors During Embryogenesis , 2011, Circulation.

[29]  G K Lewis,et al.  Isolation of monoclonal antibodies specific for human c-myc proto-oncogene product , 1985, Molecular and cellular biology.

[30]  J. Arbeit,et al.  Modulation of the human homeobox genes PRX-2 and HOXB13 in scarless fetal wounds. , 1998, The Journal of investigative dermatology.

[31]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics (e1071), TU Wien , 2014 .

[32]  Peter C. Hollenhorst,et al.  Genome-wide analyses reveal properties of redundant and specific promoter occupancy within the ETS gene family. , 2007, Genes & development.

[33]  Panayiotis V. Benos,et al.  STAMP: a web tool for exploring DNA-binding motif similarities , 2007, Nucleic Acids Res..

[34]  K. Ui-Tei,et al.  E-Cadherin Is Transcriptionally Activated via Suppression of ZEB1 Transcriptional Repressor by Small RNA-Mediated Gene Silencing , 2011, PloS one.

[35]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[36]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[37]  M. Frasch,et al.  Genome-Wide Screens for In Vivo Tinman Binding Sites Identify Cardiac Enhancers with Diverse Functional Architectures , 2013, PLoS genetics.

[38]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[39]  D. Elliott,et al.  Cardiac homeobox gene NKX2-5 mutations and congenital heart disease: associations with atrial septal defect and hypoplastic left heart syndrome. , 2003, Journal of the American College of Cardiology.

[40]  Lauren M McIntyre,et al.  Leveraging biological replicates to improve analysis in ChIP-seq experiments , 2014, Computational and structural biotechnology journal.

[41]  R. Schwartz,et al.  Combinatorial Expression of GATA4, Nkx2-5, and Serum Response Factor Directs Early Cardiac Gene Activity* , 2002, The Journal of Biological Chemistry.

[42]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[43]  Aibin He,et al.  Co-occupancy by multiple cardiac transcription factors identifies transcriptional enhancers active in heart , 2011, Proceedings of the National Academy of Sciences.

[44]  S. Leibler,et al.  Robustness in simple biochemical networks , 1997, Nature.

[45]  R. Evans,et al.  RXR alpha deficiency confers genetic susceptibility for aortic sac, conotruncal, atrioventricular cushion, and ventricular muscle defects in mice. , 1996, The Journal of clinical investigation.

[46]  Rafael C. Jimenez,et al.  The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases , 2013, Nucleic Acids Res..

[47]  L Hartley,et al.  Myogenic and morphogenetic defects in the heart tubes of murine embryos lacking the homeo box gene Nkx2-5. , 1995, Genes & Development.

[48]  J. Schmitt,et al.  A Murine Model of Holt-Oram Syndrome Defines Roles of the T-Box Transcription Factor Tbx5 in Cardiogenesis and Disease , 2001, Cell.

[49]  M. Willing,et al.  De novo 9q gain in an infant with tetralogy of Fallot with absent pulmonary valve: Patient report and review of congenital heart disease in 9q duplication syndrome , 2015, American journal of medical genetics. Part A.

[50]  J. Zuber,et al.  Stage-specific control of early B cell development by the transcription factor Ikaros , 2014, Nature Immunology.

[51]  K. Devriendt,et al.  Progressive AV‐block and anomalous venous return among cardiac anomalies associated with two novel missense mutations in the CSX/NKX2‐5 Gene , 2002, Human mutation.

[52]  N J Izzo,et al.  HL-1 cells: a cardiac muscle cell line that contracts and retains phenotypic characteristics of the adult cardiomyocyte. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[53]  H. Ouyang,et al.  Cell cycle‐specific function of Ikaros in human leukemia , 2012, Pediatric blood & cancer.

[54]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[55]  J. Seidman,et al.  Mutations in the cardiac transcription factor NKX2.5 affect diverse cardiac developmental pathways. , 1999, The Journal of clinical investigation.

[56]  Raluca Gordân,et al.  Distinguishing direct versus indirect transcription factor-DNA interactions. , 2009, Genome research.

[57]  V. Kushnirov Rapid and reliable protein extraction from yeast , 2000, Yeast.

[58]  M. Pellegrini,et al.  Scl Represses Cardiomyogenesis in Prospective Hemogenic Endothelium and Endocardium , 2012, Cell.

[59]  Sean Thomas,et al.  A Temporal Chromatin Signature in Human Embryonic Stem Cells Identifies Regulators of Cardiac Development , 2012, Cell.

[60]  R. Tjian,et al.  Transcription factor AP-4 contains multiple dimerization domains that regulate dimer specificity. , 1990, Genes & development.

[61]  Adriana C. Gittenberger-de Groot,et al.  Expression patterns of the paired-related homeobox genes MHox/Prx1 and S8/Prx2 suggest roles in development of the heart and the forebrain , 1995, Mechanisms of Development.

[62]  D. Zack,et al.  Computational analysis of tissue-specific combinatorial gene regulation: predicting interaction between transcription factors in human tissues , 2006, Nucleic acids research.

[63]  J. I. Izpisúa Belmonte,et al.  Epicardial retinoid X receptor alpha is required for myocardial growth and coronary artery formation. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[64]  Scott Barolo,et al.  Three habits of highly effective signaling pathways: principles of transcriptional control by developmental cell signaling. , 2002, Genes & development.

[65]  Z. Weng,et al.  Detection of functional DNA motifs via statistical over-representation. , 2004, Nucleic acids research.

[66]  E. Davidson,et al.  Response to Comment on "Gene Regulatory Networks and the Evolution of Animal Body Plans" , 2006, Science.

[67]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[68]  O. Prall,et al.  Zac1 Is an Essential Transcription Factor for Cardiac Morphogenesis , 2008, Circulation research.

[69]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[70]  Shamit Soneji,et al.  Genome-wide identification of TAL1's functional targets: insights into its mechanisms of action in primary erythroid cells. , 2010, Genome research.

[71]  E. Stanley,et al.  Cardiac Septal and Valvular Dysmorphogenesis in Mice Heterozygous for Mutations in the Homeobox Gene Nkx2-5 , 2000, Circulation research.

[72]  David J. Arenillas,et al.  JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles , 2013, Nucleic Acids Res..

[73]  C. Chabannon,et al.  The role of Ikaros in human erythroid differentiation. , 2008, Blood.

[74]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[75]  R. Schwartz,et al.  Retinoic acid deficiency alters second heart field formation , 2008, Proceedings of the National Academy of Sciences.

[76]  Á. Raya,et al.  Epicardial retinoid X receptor is required for myocardial growth and coronary artery formation , 2005 .

[77]  Alexander van Oudenaarden,et al.  Variability in gene expression underlies incomplete penetrance , 2009, Nature.

[78]  K. Skarstad,et al.  ChIP on Chip: surprising results are often artifacts , 2010, BMC Genomics.

[79]  E. Davidson,et al.  Transcriptional regulatory cascades in development: Initial rates, not steady state, determine network kinetics , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[80]  R. Schwartz,et al.  The Cardiac Determination Factor, Nkx2-5, Is Activated by Mutual Cofactors GATA-4 and Smad1/4 via a Novel Upstream Enhancer* , 2004, Journal of Biological Chemistry.

[81]  S. Kojic,et al.  Cardiac transcription factor Nkx2.5 interacts with p53 and modulates its activity. , 2015, Archives of biochemistry and biophysics.

[82]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[83]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[84]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[85]  Milena B. Furtado,et al.  An Nkx2-5/Bmp2/Smad1 Negative Feedback Loop Controls Heart Progenitor Specification and Proliferation , 2007, Cell.

[86]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[87]  Ivan Ovcharenko,et al.  A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis , 2012, PLoS genetics.