MapReduce implementation of a hybrid spectral library-database search method for large-scale peptide identification
暂无分享,去创建一个
Summary: A MapReduce-based implementation called MR-MSPolygraph for parallelizing peptide identification from mass spectrometry data is presented. The underlying serial method, MSPolygraph, uses a novel hybrid approach to match an experimental spectrum against a combination of a protein sequence database and a spectral library. Our MapReduce implementation can run on any Hadoop cluster environment. Experimental results demonstrate that, relative to the serial version, MR-MSPolygraph reduces the time to solution from weeks to hours, for processing tens of thousands of experimental spectra. Speedup and other related performance studies are also reported on a 400-core Hadoop cluster using spectral datasets from environmental microbial communities as inputs. Availability: The source code along with user documentation are available on http://compbio.eecs.wsu.edu/MR-MSPolygraph. Contact: ananth@eecs.wsu.edu; william.cannon@pnnl.gov Supplementary Information: Supplementary data are available at Bioinformatics online.
[1] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[2] Brian D Halligan,et al. Low cost, scalable proteomics data analysis using Amazon's cloud computing services and open source search algorithms. , 2009, Journal of proteome research.
[3] Douglas J. Baxter,et al. Large improvements in MS/MS-based peptide identification rates using a hybrid analysis. , 2011, Journal of proteome research.