ProHits: an integrated software platform for mass spectrometry-based interaction proteomics

Affinity purification coupled with mass spectrometric identification (AP-MS) is now a method of choice for charting novel protein-protein interactions, and has been applied to a large number of both small scale and high-throughput studies1. However, general and intuitive computational tools for sample tracking, AP-MS data analysis, and annotation have not kept pace with rapid methodological and instrument improvements. To address this need, we developed the ProHits LIMS platform. ProHits is a complete open source software solution for MS-based interaction proteomics that manages the entire pipeline from raw MS data files to fully annotated protein-protein interaction datasets. ProHits was designed to provide an intuitive user interface from the biologist's perspective, and can accommodate multiple instruments within a facility, multiple user groups, multiple laboratory locations, and any number of parallel projects. ProHits can manage all project scales, and supports common experimental pipelines, including those utilizing gel-based separation, gel-free analysis, and multi-dimensional protein or peptide separation. ProHits is a client-based HTML program written in PHP that runs a MySQL database on a dedicated server. The complete ProHits software solution consists of two main components: a Data Management module, and an Analyst module (Fig. 1a; see Supplementary Fig. 1 for data structure tables). These modules are supported by an Admin Office module, in which projects, instruments, user permissions and protein databases are managed (Supplementary Fig. 2). A simplified version of the software suite (“ProHits Lite”), consisting only of the Analyst module and Admin Office, is also available for users with pre-existing data management solutions or who receive pre-computed search results from analyses performed in a core MS facility (Supplementary Fig. 3). A step-by-step installation package, installation guide and user manual (see Supplementary Information) are available on the ProHits website (www.prohitsMS.com). Figure 1 Overview of ProHits. (a) Modular organisation of ProHits. The Data Management module backs up all raw mass spectrometry data from acquisition computers, and handles data conversion and database searches. The Analyst module organizes data by project, bait, ... In the Data Management module, raw data from all mass spectrometers in a facility or user group are copied to a single secure storage location in a scheduled manner. Data are organized in an instrument-specific manner, with folder and file organization mirroring the organization on the acquisition computer. ProHits also assigns unique identifiers to each folder and file. Log files and visual indicators of current connection status assist in monitoring the entire system. The Data Management module monitors the use of each instrument for reporting purposes (Supplementary Fig. 4–5). Raw MS files can be automatically converted to appropriate file formats using the open source ProteoWizard converters (http://proteowizard.sourceforge.net/). Converted files may be subjected to manual or automated database searches, followed by statistical analysis of the search results, according to any user-defined schedule; search engine parameters are also recorded to facilitate reporting and compliance with MIAPE guidelines2. Mascot3, X!Tandem4 and the TransProteomics Pipeline (TPP5) are fully integrated with ProHits via linked search engine servers (Supplementary Fig. 6–7). The Analyst module organizes data by project, bait, experiment and/or sample, for gel-based or gel-free approaches (Fig. 1a; for description of a gel-based project, see Supplementary Fig. 8). To create and analyze a gel-free affinity purification sample, the user specifies the bait gene name and species. ProHits automatically retrieves the amino acid sequence and other annotation from its associated database. Bait annotation may then be modified as necessary, for example to specify the presence of an epitope tag or mutation (Supplementary Fig. 9). A comprehensive annotation page tracks experimental details (Supplementary Fig. 10), including descriptions of the Sample, Affinity Purification protocol, Peptide Preparation methodology, and LC-MS/MS procedures. Controlled vocabulary lists for experimental descriptions can be added via drop-down menus to facilitate compliance with annotation guidelines such as MIAPE6 and MIMIx7, and to facilitate the organization and retrieval of data files. Free text notes for cross-referencing laboratory notebook pages, adding experimental details not captured in other sections, describing deviations from reference protocols and links to gel images or other file types may be added in the Experimental Detail page. Once an experiment is created, multiple samples may be linked to it, for example technical replicates of the same sample, or chromatographic fractions derived from the same preparation. All baits, experiments, samples and protocols are assigned unique identifiers. Once a sample is created, it is linked to both the relevant raw files and database search results. For multiple samples in HTP projects, automatic sample annotation may be established by using a standardized file naming system (Supplementary Fig. 11), or files may be manually linked. Alternatively, search results obtained outside of ProHits (with the X!Tandem or Mascot search engines) can be manually imported into the Analyst module (Supplementary Fig. 12). The ProHits Lite version enables uploading of external search results for users with an established MS data management system. In the Analyst module, mass spectrometry data can be explored in an intuitive manner, and results from individual samples, experiments or baits can be viewed and filtered (Supplementary Fig. 13–14). A user interface enables alignment of data from multiple baits or MS analyses using the Comparison tool. Data from individual MS runs, or derived from any user-defined sample group, are selected for visualization in a tabular format, for side-by-side comparisons (Fig. 1b; Supplementary Fig. 15–17). In the Comparison view, control groups and individual baits, experiments or samples are displayed by column. Proteins identified in each MS run or group of runs are displayed by row, and each cell corresponds to a putative protein hit, according to user-specified database search score cutoff. Cells display spectral count number, unique peptides, scores from search engines, and/or protein coverage information; a mouse-over function reveals all associated data for each cell in the table. For each protein displayed in the Comparison view, an associated Peptide link (Fig. 1b) may also be selected to reveal information such as sequence, location, spectral counts, and score, for each associated peptide. Importantly, all search results can be filtered. For example, ProHits allows for the removal of non-specific background proteins from the hit list, as defined by negative controls, search engine score thresholds, or contaminant lists. Links to the external NCBI and BioGRID8 databases are provided for each hit to facilitate data interpretation. Overlap with published interaction data housed in the BioGRID database8 can be displayed to allow immediate identification of new interaction partners. A flexible export function enables visualization in a graphical format with Cytoscape9, in which spectral counts, unique peptides, and search engine scores can be visualized as interaction edge attributes. The Analyst module also includes advanced search functions, bulk export functions for filtered or unfiltered data, and management of experimental protocols and background lists (e.g. Supplementary Fig. 18–20). Deposition of all mass spectrometry-associated data in public repositories is likely to become mandatory for publication of proteomics experiments2, 7, 10. Open access to raw files is essential for data reanalysis and cross-platform comparison; however, data submission to public repositories can be laborious due to strict formatting requirements. ProHits facilitates extraction of the necessary details in compliance with current standards, and generates Proteomic Standard Initiative (PSI) v2.5 compliant reports11, either in the MITAB format for BioGRID8 or in XML format for submission to IMEx consortium databases12, including IntAct13 (Supplementary Fig. 21). MS raw files associated with a given project can also be easily retrieved and grouped for submission to data repositories such as Tranche14. ProHits has developed to manage many large-scale in-house projects, including a systematic analysis of kinase and phosphatase interactions in yeast, consisting of 986 affinity purifications15. Smaller-scale projects from individual laboratories are readily handled in a similar manner. Examples of AP-MS data from both yeast and mammalian projects are provided in a demonstration version of ProHits at www.prohitsMS.com, and in Supplementary documents. The modular architecture of ProHits will accommodate additional new features, as dictated by future experimental and analytical needs. Although ProHits has been designed to handle protein interaction data, simple modifications of the open source code will enable straightforward adaptation to other proteomics workflows.

[1]  C. Sander,et al.  The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data , 2004, Nature Biotechnology.

[2]  Zhaohui S. Qin,et al.  A Global Protein Kinase and Phosphatase Interaction Network in Yeast , 2010, Science.

[3]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[4]  James A Hill,et al.  Proteomics FASTA Archive and Reference Resource , 2008, Proteomics.

[5]  R. Aebersold,et al.  A uniform proteomics MS/MS analysis platform utilizing open XML file formats , 2005, Molecular systems biology.

[6]  Peter Woollard,et al.  The minimum information required for reporting a molecular interaction experiment (MIMIx) , 2007, Nature Biotechnology.

[7]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[8]  M. Moran,et al.  Large-scale mapping of human protein–protein interactions by mass spectrometry , 2007, Molecular systems biology.

[9]  Katie Cottingham MCP ups the ante by mandating raw-data deposition. , 2009, Journal of proteome research.

[10]  R. Aebersold,et al.  Analysis of protein complexes using mass spectrometry , 2007, Nature Reviews Molecular Cell Biology.

[11]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2008 update , 2008, Nucleic Acids Res..

[12]  Lennart Martens,et al.  The minimum information about a proteomics experiment (MIAPE) , 2007, Nature Biotechnology.

[13]  Henning Hermjakob,et al.  Submit Your Interaction Data the IMEx Way , 2007, Proteomics.

[14]  Y. Zhang,et al.  IntAct—open source resource for molecular interaction data , 2006, Nucleic Acids Res..

[15]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.