A Massively Parallel Pipeline to Clone DNA Variants and Examine Molecular Phenotypes of Human Disease Mutations

Understanding the functional relevance of DNA variants is essential for all exome and genome sequencing projects. However, current mutagenesis cloning protocols require Sanger sequencing, and thus are prohibitively costly and labor-intensive. We describe a massively-parallel site-directed mutagenesis approach, “Clone-seq”, leveraging next-generation sequencing to rapidly and cost-effectively generate a large number of mutant alleles. Using Clone-seq, we further develop a comparative interactome-scanning pipeline integrating high-throughput GFP, yeast two-hybrid (Y2H), and mass spectrometry assays to systematically evaluate the functional impact of mutations on protein stability and interactions. We use this pipeline to show that disease mutations on protein-protein interaction interfaces are significantly more likely than those away from interfaces to disrupt corresponding interactions. We also find that mutation pairs with similar molecular phenotypes in terms of both protein stability and interactions are significantly more likely to cause the same disease than those with different molecular phenotypes, validating the in vivo biological relevance of our high-throughput GFP and Y2H assays, and indicating that both assays can be used to determine candidate disease mutations in the future. The general scheme of our experimental pipeline can be readily expanded to other types of interactome-mapping methods to comprehensively evaluate the functional relevance of all DNA variants, including those in non-coding regions.

[1]  N. Rahman,et al.  Molecular causes for BUBR1 dysfunction in the human cancer predisposition syndrome mosaic variegated aneuploidy. , 2010, Cancer research.

[2]  L. Suaud,et al.  Differential effects of Hsc70 and Hsp70 on the intracellular trafficking and functional expression of epithelial sodium channels. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Julie M. Sahalie,et al.  An experimentally derived confidence score for binary protein-protein interactions , 2008, Nature Methods.

[4]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[5]  Yu Xia,et al.  Structural principles within the human-virus protein-protein interaction network , 2011, Proceedings of the National Academy of Sciences.

[6]  D. Baker,et al.  High Resolution Mapping of Protein Sequence–Function Relationships , 2010, Nature Methods.

[7]  Y. Kajiwara,et al.  Monoallelic BUB1B mutations and defective mitotic‐spindle checkpoint in seven families with premature chromatid separation (PCS) syndrome , 2006, American journal of medical genetics. Part A.

[8]  P. Parham,et al.  Uncoating protein (hsc70) binds a conformationally labile domain of clathrin light chain LCa to stimulate ATP hydrolysis , 1990, Cell.

[9]  A. Philippakis,et al.  Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities , 2006, Nature Biotechnology.

[10]  A. Sepulveda,et al.  Identification of a Second MutL DNA Mismatch Repair Complex (hPMS1 and hMLH1) in Human Epithelial Cells* , 2000, The Journal of Biological Chemistry.

[11]  R. Machado The Molecular Genetics and Cellular Mechanisms Underlying Pulmonary Arterial Hypertension , 2012, Scientifica.

[12]  J. Massagué,et al.  TGFβ in Cancer , 2008, Cell.

[13]  Haiyuan Yu,et al.  Exploring mechanisms of human disease through structurally resolved protein interactome networks. , 2014, Molecular bioSystems.

[14]  J. S. Parker,et al.  The Cellular Chaperone Hsc70 Is Specifically Recruited to Reovirus Viral Factories Independently of Its Chaperone Function , 2011, Journal of Virology.

[15]  R. Nussinov,et al.  Predicting protein-protein interactions on a proteome scale by matching evolutionary and structural similarities at interfaces using PRISM , 2011, Nature Protocols.

[16]  Haiyuan Yu,et al.  Elucidating Common Structural Features of Human Pathogenic Variations Using Large‐Scale Atomic‐Resolution Protein Networks , 2014, Human mutation.

[17]  N. Rahman,et al.  Constitutional aneuploidy and cancer predisposition caused by biallelic mutations in BUB1B , 2004, Nature Genetics.

[18]  Robert D. Finn,et al.  iPfam: visualization of protein?Cprotein interactions in PDB at domain and amino acid resolutions , 2005, Bioinform..

[19]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[20]  Mingming Jia,et al.  COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer , 2010, Nucleic Acids Res..

[21]  J. Aerssens,et al.  Minor variant detection in amplicons using 454 massive parallel pyrosequencing: experiences and considerations for successful applications. , 2011, BioTechniques.

[22]  David E Hill,et al.  next-generation sequencing to generate interactome datasets , 2011 .

[23]  S. Cantor,et al.  The FANCJ/MutLα interaction is required for correction of the cross‐link response in FA‐J cells , 2007, The EMBO journal.

[24]  L. Aaltonen,et al.  SMAD genes in juvenile polyposis , 1999, Genes, chromosomes & cancer.

[25]  P. Peltomäki,et al.  Mutations predisposing to hereditary nonpolyposis colorectal cancer: database and results of a collaborative study. The International Collaborative Group on Hereditary Nonpolyposis Colorectal Cancer. , 1997, Gastroenterology.

[26]  A. McKenna,et al.  The Mutational Landscape of Head and Neck Squamous Cell Carcinoma , 2011, Science.

[27]  P. Aloy,et al.  Interactome3D: adding structural details to protein networks , 2013, Nature Methods.

[28]  Marcus B Smolka,et al.  DNA damage signaling recruits the Rtt107-Slx4 scaffolds via Dpb11 to mediate replication stress response. , 2010, Molecular cell.

[29]  S. Gygi,et al.  Network organization of the human autophagy system , 2010, Nature.

[30]  Kumiko Ishikawa,et al.  A novel high-throughput (HTP) cloning strategy for site-directed designed chimeragenesis and mutation using the Gateway cloning system , 2005, Nucleic acids research.

[31]  P. Babitzke,et al.  Gel mobility shift assays to detect protein-RNA interactions. , 2012, Methods in molecular biology.

[32]  A. Ferré-D’Amaré,et al.  Rapid Construction of Empirical RNA Fitness Landscapes , 2010, Science.

[33]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[34]  P. Stenson,et al.  The Human Gene Mutation Database: 2008 update , 2009, Genome Medicine.

[35]  S. Gabriel,et al.  Analysis of 6,515 exomes reveals a recent origin of most human protein-coding variants , 2012, Nature.

[36]  B. Honig,et al.  Structure-based prediction of protein-protein interactions on a genome-wide scale , 2012, Nature.

[37]  Thomas M Green,et al.  A public genome-scale lentiviral expression library of human ORFs , 2011, Nature Methods.

[38]  Weidong Tian,et al.  Isoform discovery by targeted cloning, 'deep-well' pooling and parallel sequencing , 2008, Nature Methods.

[39]  Kausik Chakraborty,et al.  Chemical chaperones assist intracellular folding to buffer mutational variations , 2012, Nature chemical biology.

[40]  A. Barabasi,et al.  Interactome Networks and Human Disease , 2011, Cell.

[41]  Haiyuan Yu,et al.  INstruct: a database of high-quality 3D structurally resolved protein interactome networks , 2013, Bioinform..

[42]  Minoru Yoshida,et al.  Cross-Species Protein Interactome Mapping Reveals Species-Specific Wiring of Stress Response Pathways , 2013, Science Signaling.

[43]  J. Wells,et al.  High-resolution epitope mapping of hGH-receptor interactions by alanine-scanning mutagenesis. , 1989, Science.

[44]  M. Mann,et al.  Stable Isotope Labeling by Amino Acids in Cell Culture, SILAC, as a Simple and Accurate Approach to Expression Proteomics* , 2002, Molecular & Cellular Proteomics.

[45]  Hideyuki Suzuki,et al.  A yeast two-hybrid assay provides a simple way to evaluate the vast majority of hMLH1 germ-line mutations. , 2003, Cancer research.

[46]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[47]  S. Fields,et al.  A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function , 2012, Proceedings of the National Academy of Sciences.

[48]  M. Vidal,et al.  Edgetic perturbation models of human inherited disorders , 2009, Molecular systems biology.

[49]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[50]  W. Bodmer,et al.  Mutations in DPC4 (SMAD4) cause juvenile polyposis syndrome, but only account for a minority of cases. , 1998, Human molecular genetics.

[51]  A. Barabasi,et al.  An empirical framework for binary interactome mapping , 2008, Nature Methods.

[52]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[53]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[54]  Graham M Lord,et al.  Molecular genetic characterization of SMAD signaling molecules in pulmonary arterial hypertension , 2011, Human mutation.

[55]  M. Vidal,et al.  High-throughput yeast two-hybrid assays for large-scale protein interaction mapping. , 2001, Methods.

[56]  Arnaud Céol,et al.  3did: identification and classification of domain-based interactions of known three-dimensional structure , 2010, Nucleic Acids Res..

[57]  M. Mann,et al.  A practical recipe for stable isotope labeling by amino acids in cell culture (SILAC) , 2006, Nature Protocols.

[58]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of human colon and rectal cancer , 2012, Nature.

[59]  David E Hill,et al.  Yeast one-hybrid assays for gene-centered human gene regulatory network mapping , 2011, Nature Methods.

[60]  Gabor T. Marth,et al.  Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics , 2013, Science.

[61]  J. Mecklin,et al.  The International Collaborative Group on Hereditary Non-Polyposis Colorectal Cancer (ICG-HNPCC) , 1991, Diseases of the colon and rectum.

[62]  Haiyuan Yu,et al.  Three-dimensional reconstruction of protein networks provides insight into human genetic disease , 2012, Nature Biotechnology.

[63]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[64]  Jonathan D. G. Jones,et al.  Evidence for Network Evolution in an Arabidopsis Interactome Map , 2011, Science.

[65]  Joseph B Hiatt,et al.  Activity-enhancing mutations in an E3 ubiquitin ligase identified by high-throughput mutagenesis , 2013, Proceedings of the National Academy of Sciences.