Spotlite: web application and augmented algorithms for predicting co-complexed proteins from affinity purification--mass spectrometry data.

Protein-protein interactions defined by affinity purification and mass spectrometry (APMS) suffer from high false discovery rates. Consequently, lists of potential interactions must be pruned of contaminants before network construction and interpretation, historically an expensive, time-intensive, and error-prone task. In recent years, numerous computational methods were developed to identify genuine interactions from the hundreds of candidates. Here, comparative analysis of three popular algorithms, HGSCore, CompPASS, and SAINT, revealed complementarity in their classification accuracies, which is supported by their divergent scoring strategies. We improved each algorithm by an average area under a receiver operating characteristics curve increase of 16% by integrating a variety of indirect data known to correlate with established protein-protein interactions, including mRNA coexpression, gene ontologies, domain-domain binding affinities, and homologous protein interactions. Each APMS scoring approach was incorporated into a separate logistic regression model along with the indirect features; the resulting three classifiers demonstrate improved performance on five diverse APMS data sets. To facilitate APMS data scoring within the scientific community, we created Spotlite, a user-friendly and fast web application. Within Spotlite, data can be scored with the augmented classifiers, annotated, and visualized ( http://cancer.unc.edu/majorlab/software.php ). The utility of the Spotlite platform to reveal physical, functional, and disease-relevant characteristics within APMS data is established through a focused analysis of the KEAP1 E3 ubiquitin ligase.

[1]  G. Sykiotis,et al.  Stress-Activated Cap'n'collar Transcription Factors in Aging and Human Disease , 2010, Science Signaling.

[2]  Mark Hannink,et al.  Keap1 Is a Redox-Regulated Substrate Adaptor Protein for a Cul3-Dependent Ubiquitin Ligase Complex , 2004, Molecular and Cellular Biology.

[3]  Guomin Liu,et al.  SAINTexpress: improvements and additional features in Significance Analysis of INTeractome software. , 2014, Journal of proteomics.

[4]  Lucia Anna Muscarella,et al.  Frequent epigenetics inactivation of KEAP1 gene in non-small cell lung cancer , 2011, Epigenetics.

[5]  Gary D. Bader,et al.  An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology , 2010, BMC Bioinformatics.

[6]  Stephen G Swisher,et al.  Nrf2 and Keap1 Abnormalities in Non–Small Cell Lung Carcinoma and Association with Clinicopathologic Features , 2010, Clinical Cancer Research.

[7]  Dmitrij Frishman,et al.  The Negatome database: a reference set of non-interacting protein pairs , 2009, Nucleic Acids Res..

[8]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[9]  Hyungwon Choi,et al.  SAINT: Probabilistic Scoring of Affinity Purification - Mass Spectrometry Data , 2010, Nature Methods.

[10]  T. Ogura,et al.  Keap1 is a forked-stem dimer structure with two large spheres enclosing the intervening, double glycine repeat, and C-terminal domains , 2010, Proceedings of the National Academy of Sciences.

[11]  Andrei L. Turinsky,et al.  A Census of Human Soluble Protein Complexes , 2012, Cell.

[12]  Hyungwon Choi,et al.  SAINT-MS1: protein-protein interaction scoring using label-free intensity data in affinity purification-mass spectrometry experiments. , 2012, Journal of proteome research.

[13]  Amber L. Couzens,et al.  The CRAPome: a Contaminant Repository for Affinity Purification Mass Spectrometry Data , 2013, Nature Methods.

[14]  Ian M. Donaldson,et al.  iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence , 2010, Database J. Biol. Databases Curation.

[15]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[16]  S. Hirohashi,et al.  Loss of Keap1 function activates Nrf2 and provides advantages for lung cancer cell growth. , 2008, Cancer research.

[17]  R. Benezra,et al.  Mad2 is a critical mediator of the chromosome instability observed upon Rb and p53 pathway inhibition. , 2011, Cancer cell.

[18]  S. Gygi,et al.  Network organization of the human autophagy system , 2010, Nature.

[19]  P. Robinson,et al.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. , 2008, American journal of human genetics.

[20]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[21]  Julian Mintseris,et al.  A Protein Complex Network of Drosophila melanogaster , 2011, Cell.

[22]  Mihaela E. Sardiu,et al.  Probabilistic assembly of human protein interaction networks from label-free quantitative proteomics , 2008, Proceedings of the National Academy of Sciences.

[23]  V. Raman,et al.  Nrf2-deficiency creates a responsive microenvironment for metastasis to the lung. , 2010, Carcinogenesis.

[24]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[25]  Olga G. Troyanskaya,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm332 Data and text mining , 2022 .

[26]  D. Ray,et al.  The HECT E3 ligase Smurf2 is required for Mad2-dependent spindle assembly checkpoint , 2008, The Journal of cell biology.

[27]  F. Askin,et al.  KEAP1 gene mutations and NRF2 activation are common in pulmonary papillary adenocarcinoma , 2011, Journal of Human Genetics.

[28]  William Stafford Noble,et al.  Predicting Co-Complexed Protein Pairs from Heterogeneous Data , 2008, PLoS Comput. Biol..

[29]  Nagiza F. Samatova,et al.  From pull-down data to protein interaction networks and complexes with biological relevance. , 2008, Bioinformatics.

[30]  S. Gygi,et al.  Defining the Human Deubiquitinating Enzyme Interaction Landscape , 2009, Cell.

[31]  Tsutomu Ohta,et al.  Structural basis for defects of Keap1 activity provoked by its point mutations in lung cancer. , 2006, Molecular cell.

[32]  W. Kibbe,et al.  Annotating the human genome with Disease Ontology , 2009, BMC Genomics.

[33]  J. Harper,et al.  The Keap1-BTB Protein Is an Adaptor That Bridges Nrf2 to a Cul3-Based E3 Ligase: Oxidative Stress Sensing by a Cul3-Keap1 Ligase , 2004, Molecular and Cellular Biology.

[34]  Ziv Bar-Joseph,et al.  Evaluation of different biological data and computational classification methods for use in protein interaction prediction , 2006, Proteins.

[35]  S. Hirohashi,et al.  Loss of Keap 1 Function Activates Nrf 2 and Provides Advantages for Lung Cancer Cell Growth , 2008 .

[36]  E. Birney,et al.  The International Protein Index: An integrated database for proteomics experiments , 2004, Proteomics.

[37]  Trey Ideker,et al.  Integrating physical and genetic maps: from genomes to interaction networks , 2007, Nature Reviews Genetics.

[38]  M. Sonobe,et al.  Mutations in Keap1 are a potential prognostic factor in resected non‐small cell lung cancer , 2010, Journal of surgical oncology.

[39]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[40]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[41]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[42]  Minghua Deng,et al.  Inferring Domain–Domain Interactions From Protein–Protein Interactions , 2002 .

[43]  B. Roberts,et al.  S. cerevisiae genes required for cell cycle arrest in response to loss of microtubule function , 1991, Cell.

[44]  Haixuan Yang,et al.  Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty , 2012, Bioinform..

[45]  A. Barabasi,et al.  An empirical framework for binary interactome mapping , 2008, Nature Methods.

[46]  Mark Gerstein,et al.  Information assessment on predicting protein-protein interactions , 2004, BMC Bioinformatics.

[47]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[48]  Gary D Bader,et al.  Analyzing yeast protein–protein interaction data obtained from different sources , 2002, Nature Biotechnology.

[49]  Insuk Lee,et al.  A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality , 2007, BMC Bioinformatics.

[50]  Geoffrey J. Barton,et al.  PIPs: human protein–protein interaction prediction database , 2008, Nucleic Acids Res..

[51]  Yue Xiong,et al.  BTB Protein Keap1 Targets Antioxidant Transcription Factor Nrf2 for Ubiquitination by the Cullin 3-Roc1 Ligase , 2005, Molecular and Cellular Biology.

[52]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[53]  S. Lowe,et al.  Mad2 overexpression promotes aneuploidy and tumorigenesis in mice. , 2007, Cancer cell.

[54]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..

[55]  S. Cannistra,et al.  Keap1 mutations and Nrf2 pathway activation in epithelial ovarian cancer. , 2011, Cancer research.

[56]  Edward L. Huttlin,et al.  Systematic and quantitative assessment of the ubiquitin-modified proteome. , 2011, Molecular cell.

[57]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[58]  D. Chan,et al.  Analysis of the Human Endogenous Coregulator Complexome , 2011, Cell.

[59]  Amanda J. Guise,et al.  The functional interactome landscape of the human histone deacetylase family , 2013, Molecular systems biology.

[60]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[61]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[62]  John H. Morris,et al.  Global landscape of HIV–human protein complexes , 2011, Nature.

[63]  Kengo Kinoshita,et al.  COXPRESdb: a database to compare gene coexpression in seven model animals , 2010, Nucleic Acids Res..

[64]  J. Rothberg,et al.  Gaining confidence in high-throughput protein interaction networks , 2004, Nature Biotechnology.

[65]  Andrew W. Murray,et al.  Feedback control of mitosis in budding yeast , 1991, Cell.

[66]  Abbreviations , 1971 .

[67]  Andreas Wagner,et al.  A statistical framework for combining and interpreting proteomic datasets , 2004, Bioinform..

[68]  G. Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Permutation P -values Should Never Be Zero: Calculating Exact P -values When Permutations Are Randomly Drawn , 2011 .

[69]  Bridgid E Hast,et al.  Proteomic analysis of ubiquitin ligase KEAP1 reveals associated proteins that inhibit NRF2 ubiquitination. , 2013, Cancer research.

[70]  T. Takagi,et al.  Prediction of protein-protein interaction sites using support vector machines. , 2004, Protein engineering, design & selection : PEDS.

[71]  Janan T Eppig,et al.  The mammalian phenotype ontology: enabling robust annotation and comparative analysis , 2009, Wiley interdisciplinary reviews. Systems biology and medicine.

[72]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..