论文信息 - Combining Results of Multiple Search Engines

Combining Results of Multiple Search Engines

A crucial component of the analysis of shotgun proteomics datasets is the search engine, an algorithm that attempts to identify the peptide sequence from the parent molecular ion that produced each fragment ion spectrum in the dataset. There are many different search engines, both commercial and open source, each employing a somewhat different technique for spectrum identification. The set of high-scoring peptide-spectrum matches for a defined set of input spectra differs markedly among the various search engine results; individual engines each provide unique correct identifications among a core set of correlative identifications. This has led to the approach of combining the results from multiple search engines to achieve improved analysis of each dataset. Here we review the techniques and available software for combining the results of multiple search engines and briefly compare the relative performance of these techniques. Molecular & Cellular Proteomics 12: 10.1074/mcp.R113.027797, 2383– 2393, 2013. The most commonly used proteomics approach, shotgun proteomics, has become an invaluable tool for the highthroughput characterization of proteins in biological samples (1). This workflow relies on the combination of protein digestion, liquid chromatography (LC) 1 separation, tandem mass spectrometry (MS/MS), and sophisticated data analysis in its aim to derive an accurate and complete set of peptides and their inferred proteins that are present in the sample being studied. Although many variations are possible, the typical workflow begins with the digestion of proteins into peptides with a protease, typically trypsin. The resulting peptide mixture is first separated via LC and then subjected to mass spectrometry (MS) analysis. The MS instrument acquires fragment ion spectra on a subset of the peptide precursor ions that it measures. From the MS/MS spectra that measure the abundance and mass of the peptide ion fragments, peptides present in the mixture are identified and proteins are inferred by means of downstream computational analysis. The informatics component of the shotgun proteomics workflow is crucial for proper data analysis (2), and a wide variety of tools have emerged for this purpose (3). The typical informatics workflow can be summarized in a few steps: conversion from vendor proprietary formats to an open format, high-throughput interpretation of the MS/MS spectra with a search engine, and statistical validation of the results with estimation of the false discovery rate at a selected score threshold. Various tools for measuring relative peptide abundances may be applied, dependent on the type of quantitation technique applied in the experiment. Finally, the proteins present, and their abundance in the sample, are inferred based on the peptide identifications. One of the most computationally intensive and diverse steps in the computational analysis workflow is the use of a