Xolik: finding cross-linked peptides with maximum paired scores in linear time

Motivation Cross-linking technique coupled with mass spectrometry (MS) is widely used in the analysis of protein structures and protein-protein interactions. In order to identify cross-linked peptides from MS data, we need to consider all pairwise combinations of peptides, which is computationally prohibitive when the sequence database is large. To alleviate this problem, some heuristic screening strategies are used to reduce the number of peptide pairs during the identification. However, heuristic screening criteria may ignore true findings. Results We directly tackle the combination challenge without using any screening strategies. With the additive scoring function and the data structure of double-ended queue, the proposed algorithm reduces the quadratic time complexity of exhaustive searching down to the linear time complexity. We implement the algorithm in a tool named Xolik, and the running time of Xolik is validated using databases with different number of proteins. Experiments using synthetic and empirical datasets show that Xolik outperforms existing tools in terms of running time and statistical power. Availability Source code and binaries of Xolik are freely available at http://bioinformatics.ust.hk/Xolik.html. Contact eeyu@ust.hk Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Esther Willems,et al.  Cross-linking immunoprecipitation-MS (xIP-MS): Topological Analysis of Chromatin-associated Protein Complexes Using Single Affinity Purification* , 2015, Molecular & Cellular Proteomics.

[2]  J. Reilly,et al.  Structural analysis of a prokaryotic ribosome using a novel amidinating cross-linker and mass spectrometry. , 2011, Journal of proteome research.

[3]  Michael Götze,et al.  Automated Assignment of MS/MS Cleavable Cross-Links in Protein 3D-Structure Analysis , 2014, Journal of The American Society for Mass Spectrometry.

[4]  Chun-Nan Hsu,et al.  Weakly supervised learning of biomedical information extraction from curated data , 2016, BMC Bioinformatics.

[5]  Peter R Baker,et al.  Finding Chimeras: a Bioinformatics Strategy for Identification of Cross-linked Peptides* , 2009, Molecular & Cellular Proteomics.

[6]  Jian Wang,et al.  Combinatorial Approach for Large-scale Identification of Linked Peptides from Tandem Mass Spectrometry Spectra* , 2014, Molecular & Cellular Proteomics.

[7]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[8]  Christoph H. Borchers,et al.  An Isotopically Coded CID-cleavable Biotinylated Cross-linker for Structural Proteomics* , 2010, Molecular & Cellular Proteomics.

[9]  Ruedi Aebersold,et al.  Identification of cross-linked peptides from large sequence databases , 2008, Nature Methods.

[10]  William Stafford Noble,et al.  Detecting cross-linked peptides by searching against a database of cross-linked peptide pairs. , 2010, Journal of proteome research.

[11]  Ning Li,et al.  Exhaustively Identifying Cross-Linked Peptides with a Linear Computational Complexity , 2016, bioRxiv.

[12]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[13]  Birgit Schilling,et al.  MS2Assign, automated assignment and nomenclature of tandem mass spectra of chemically crosslinked peptides , 2003, Journal of the American Society for Mass Spectrometry.

[14]  Gordon A Anderson,et al.  Xlink-identifier: an automated data analysis platform for confident identifications of chemically cross-linked peptides using tandem mass spectrometry. , 2011, Journal of proteome research.

[15]  Arlo Z. Randall,et al.  Development of a Novel Cross-linking Strategy for Fast and Accurate Identification of Cross-linked Peptides of Protein Complexes* , 2010, Molecular & Cellular Proteomics.

[16]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[17]  Ting Chen,et al.  Algorithms for identifying protein cross-links via tandem mass spectrometry , 2001, J. Comput. Biol..

[18]  David R Goodlett,et al.  xComb: a cross-linked peptide database approach to protein-protein interaction analysis. , 2010, Journal of proteome research.

[19]  Michael J MacCoss,et al.  Kojak: efficient analysis of chemically cross-linked protein complexes. , 2015, Journal of proteome research.

[20]  Andrea Sinz,et al.  Cleavable cross-linker for protein structure analysis: reliable identification of cross-linking products by tandem MS. , 2010, Analytical chemistry.

[21]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[22]  Malin M. Young,et al.  High throughput protein fold identification by using experimental constraints derived from intramolecular cross-links and mass spectrometry , 2000, Proc. Natl. Acad. Sci. USA.

[23]  M. MacCoss,et al.  A fast SEQUEST cross correlation algorithm. , 2008, Journal of proteome research.

[24]  Robert J. Chalkley,et al.  Matching Cross-linked Peptide Spectra: Only as Good as the Worse Identification* , 2013, Molecular & Cellular Proteomics.

[25]  Albert J R Heck,et al.  Proteome-wide profiling of protein assemblies by cross-linking mass spectrometry , 2015, Nature Methods.

[26]  Ning Li,et al.  ECL: an exhaustive search tool for the identification of cross-linked peptides using whole database , 2016, BMC Bioinformatics.

[27]  M. Dong,et al.  Identification of cross-linked peptides from complex samples , 2012, Nature Methods.

[28]  Friedrich Förster,et al.  False discovery rate estimation for cross-linked peptides identified by mass spectrometry , 2012, Nature Methods.