Y2H-SCORES: A statistical framework to infer protein-protein interactions from next-generation yeast-two-hybrid sequence data

Interactomes embody one of the most efficient representations of cellular behavior by revealing function through protein associations. Building such models requires reproducible and efficient high-throughput implementation of the prevailing molecular techniques to ascertain interacting partners. Among them, yeast two-hybrid (Y2H) and its high-throughput version, termed next-generation interaction screening (NGIS), comprise promising approaches to generate extensive protein-protein interaction networks. However, challenges remain to mining reliable information from these screens and thus, limit its broader implementation. Here, we describe a statistical framework, designated Y2H-SCORES, for analyzing high-throughput Y2H screens that considers key aspects of experimental design, normalization, and controls. Three quantitative ranking scores were implemented to identify interacting partners, comprising: 1) significant enrichment under selection for positive interactions, 2) degree of interaction specificity among multi-bait comparisons, and 3) selection of in-frame interactors. Using simulation and an empirical dataset, we provide a quantitative assessment to predict interacting partners under a wide range of experimental scenarios, facilitating independent confirmation by one-to-one bait-prey tests. Simulation of Y2H-NGIS identified conditions that maximize interactor mining efficiency, which can be achieved with protocols such as prey library normalization, maintenance of larger culture volumes and replication of experimental treatments. Y2H-SCORES can be implemented in different yeast-based interaction screenings, accelerating data analytics of those technologies. Proof-of-concept was demonstrated by discovery and validation of a novel interaction between the barley powdery mildew effector, AVRA13, with the vesicle-mediated thylakoid membrane biogenesis protein, HvTHF1. Author Summary Organisms respond to their environment through networks of interacting proteins and other biomolecules. In order to investigate these interacting proteins, many in vitro and in vivo techniques have been used. Among these, yeast two-hybrid (Y2H) has been adapted for use in combination with next generation sequencing (NGS) to approach protein-protein interactions on a genome-wide scale. The fusion of these two methods has been termed next-generation-interaction screening, abbreviated as Y2H-NGIS. However, the massive and diverse data sets resulting from this technology have presented unique challenges to analysis. To address these challenges, we optimized the computational and statistical evaluation of Y2H-NGIS to provide metrics to identify high-confidence interacting proteins under a variety of dataset scenarios. Our proposed framework can be extended to different yeast-based interaction settings, utilizing the general principles of enrichment, specificity, and in-frame prey selection to accurately assemble protein-protein interaction networks. Lastly, we showed how the pipeline works experimentally, by identifying and validating a novel interaction between the barley powdery mildew effector AVRA13 and the barley vesicle-mediated thylakoid membrane biogenesis protein, HvTHF1. Y2H-SCORES software is available at GitHub repository https://github.com/Wiselab2/Y2H-SCORES.

[1]  Roger P. Wise,et al.  NGPINT: A Next-generation protein-protein interaction software , 2020, bioRxiv.

[2]  Kai Li,et al.  A DnaJ protein that interacts with soybean mosaic virus coat protein serves as a key susceptibility factor for viral infection. , 2020, Virus research.

[3]  James E. Allen,et al.  Ensembl Genomes 2020—enabling non-vertebrate genomic research , 2019, Nucleic Acids Res..

[4]  D. Nettleton,et al.  Small RNA discovery in the interaction between barley and the powdery mildew pathogen , 2019, BMC Genomics.

[5]  Venkatramanan Krishnamani,et al.  MALTA: A calculator for estimating the coverage with shRNA, CRISPR, and cDNA libraries , 2019, SoftwareX.

[6]  Kris Gevaert,et al.  Exploring the protein-protein interaction landscape in plants. , 2018, Plant, cell & environment.

[7]  Alain Goossens,et al.  A user-friendly platform for yeast two-hybrid library screening using next generation sequencing , 2018, PloS one.

[8]  Axel Himmelbach,et al.  Evolutionarily conserved partial gene duplication in the Triticeae tribe of grasses confers pathogen resistance , 2018, Genome Biology.

[9]  M. Kabbage,et al.  An inhibitor of apoptosis (SfIAP) interacts with SQUAMOSA promoter‐binding protein (SBP) transcription factors that exhibit pro‐cell death characteristics , 2018, Plant direct.

[10]  Tatsuya Akutsu,et al.  Determining the minimum number of protein-protein interactions required to support known protein complexes , 2018, PloS one.

[11]  Min Zhang,et al.  Semaphorin3A induces nerve regeneration in the adult cornea-a switch from its repulsive role in development , 2018, PloS one.

[12]  Johanna Hardin,et al.  Selecting between‐sample RNA‐Seq normalization methods from the perspective of their assumptions , 2016, Briefings Bioinform..

[13]  Ahmed Mahas,et al.  RNA virus interference via CRISPR/Cas13a system in plants , 2017, Genome Biology.

[14]  K. Hansen,et al.  Linear models enable powerful differential activity analysis in massively parallel reporter assays , 2017, BMC Genomics.

[15]  Joseph R. Ecker,et al.  CrY2H-seq: a massively-multiplexed assay for deep coverage interactome mapping , 2017, Nature Methods.

[16]  Roger P Wise,et al.  Interchromosomal Transfer of Immune Regulation During Infection of Barley with the Powdery Mildew Pathogen , 2017, G3: Genes, Genomes, Genetics.

[17]  Priyanka Surana,et al.  Membrane trafficking in resistance gene-mediated defense against the barley powdery mildew fungus , 2017 .

[18]  Takaki Maekawa,et al.  Allelic barley MLA immune receptors recognize sequence-unrelated avirulence effectors of the powdery mildew pathogen , 2016, Proceedings of the National Academy of Sciences.

[19]  Patrick Breheny,et al.  DEEPN as an Approach for Batch Processing of Yeast 2-Hybrid Interactions. , 2016, Cell reports.

[20]  David E Hill,et al.  Pooled‐matrix protein interaction screens using Barcode Fusion Genetics , 2016, Molecular systems biology.

[21]  P. Moffett,et al.  The Chloroplastic Protein THF1 Interacts with the Coiled-Coil Domain of the Disease Resistance Protein N′ and Regulates Light-Dependent Cell Death1[OPEN] , 2016, Plant Physiology.

[22]  Bernhard Suter,et al.  Next-Generation Sequencing for Binary Protein–Protein Interactions , 2015, Front. Genet..

[23]  Xing-Ming Zhao,et al.  PPIM: A Protein-Protein Interaction Database for Maize1 , 2015, Plant Physiology.

[24]  Mehdi Sadeghi,et al.  Interlog protein network: an evolutionary benchmark of protein interaction networks for the evaluation of clustering algorithms , 2015, BMC Bioinformatics.

[25]  G. Barton,et al.  Evaluation of tools for differential gene expression analysis by RNA-seq on a 48 biological replicate experiment , 2015, 1505.02017.

[26]  Hsin-Hung Lin,et al.  Transcriptome dynamics of developing maize leaves and genomewide prediction of cis elements and their cognate transcription factors , 2015, Proceedings of the National Academy of Sciences.

[27]  P. Bayrak-Toydemir,et al.  Hereditary hemorrhagic telangiectasia: genetics and molecular diagnostics in a new era , 2015, Front. Genet..

[28]  S. Fields,et al.  The yeast two-hybrid assay: still finding connections after 25 years , 2014, Nature Methods.

[29]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[30]  J. Wells,et al.  Small-molecule inhibitors of protein-protein interactions: progressing toward the reality. , 2014, Chemistry & biology.

[31]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[32]  Nicolas Servant,et al.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis , 2013, Briefings Bioinform..

[33]  J. Dangl,et al.  Pivoting the Plant Immune System from Dissection to Deployment , 2013, Science.

[34]  Philip R. Johnson,et al.  Accelerating Next-Generation Vaccine Development for Global Disease Prevention , 2013, Science.

[35]  Sébastien Aubourg,et al.  Plant protein interactomes. , 2013, Annual review of plant biology.

[36]  Ulrich Stelzl,et al.  A Y2H-seq approach defines the human protein methyltransferase interactome , 2013, Nature Methods.

[37]  R. J. Martinez,et al.  Correction: Microbial Community Analysis of a Coastal Salt Marsh Affected by the Deepwater Horizon Oil Spill , 2012, PLoS ONE.

[38]  Günter P. Wagner,et al.  Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples , 2012, Theory in Biosciences.

[39]  J. Reece-Hoyes,et al.  Yeast one-hybrid assays: a historical and technical perspective. , 2012, Methods.

[40]  Todd C. Mockler,et al.  Host-Selective Toxins of Pyrenophora tritici-repentis Induce Common Responses Associated with Host Susceptibility , 2012, PloS one.

[41]  Darrell Desveaux,et al.  Quantitative Interactor Screening with next-generation Sequencing (QIS-Seq) identifies Arabidopsis thaliana MLO2 as a target of the Pseudomonas syringae type III effector HopZ2 , 2012, BMC Genomics.

[42]  Sangeeta Khare,et al.  Enhancing the role of veterinary vaccines reducing zoonotic diseases of humans: linking systems biology with vaccine development. , 2011, Vaccine.

[43]  Ann L Oberg,et al.  Systems biology approaches to new vaccine development. , 2011, Current opinion in immunology.

[44]  Ming Chen,et al.  PRIN: a predicted rice interactome network , 2011, BMC Bioinformatics.

[45]  Shili Lin,et al.  Rank aggregation methods , 2010 .

[46]  Javier De Las Rivas,et al.  Protein–Protein Interactions Essentials: Key Concepts to Building and Analyzing Interactome Networks , 2010, PLoS Comput. Biol..

[47]  S. Lukyanov,et al.  Normalizing cDNA Libraries , 2010, Current protocols in molecular biology.

[48]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[49]  David E Hill,et al.  High-quality binary interactome mapping. , 2010, Methods in enzymology.

[50]  L. Ciuffetti,et al.  Ptr ToxA interacts with a chloroplast-localized protein. , 2007, Molecular plant-microbe interactions : MPMI.

[51]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[52]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[53]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.