Availability of MudPIT data for classification of biological samples

BackgroundMass spectrometry is an important analytical tool for clinical proteomics. Primarily employed for biomarker discovery, it is increasingly used for developing methods which may help to provide unambiguous diagnosis of biological samples. In this context, we investigated the classification of phenotypes by applying support vector machine (SVM) on experimental data obtained by MudPIT approach. In particular, we compared the performance capabilities of SVM by using two independent collection of complex samples and different data-types, such as mass spectra (m/z), peptides and proteins.ResultsGlobally, protein and peptide data allowed a better discriminant informative content than experimental mass spectra (overall accuracy higher than 87% in both collection 1 and 2). These results indicate that sequencing of peptides and proteins reduces the experimental noise affecting the raw mass spectra, and allows the extraction of more informative features available for the effective classification of samples. In addition, proteins and peptides features selected by SVM matched for 80% with the differentially expressed proteins identified by the MAProMa software.ConclusionsThese findings confirm the availability of the most label-free quantitative methods based on processing of spectral count and SEQUEST-based SCORE values. On the other hand, it stresses the usefulness of MudPIT data for a correct grouping of sample phenotypes, by applying both supervised and unsupervised learning algorithms. This capacity permit the evaluation of actual samples and it is a good starting point to translate proteomic methodology to clinical application.

[1]  Ewa Szczurek,et al.  Classification of peptide mass fingerprint data by novel no-regret boosting method , 2009, Comput. Biol. Medicine.

[2]  J. Cheville,et al.  Consistency of a two clinical site sample collection: A proteomics study , 2010, Proteomics. Clinical applications.

[3]  K. Verhoeckx,et al.  Integration of two-dimensional LC-MS with multivariate statistics for comparative analysis of proteomic samples. , 2006, Analytical chemistry.

[4]  Ulrich Sack,et al.  Outcome Prediction in Pneumonia Induced ALI/ARDS by Clinical Features and Peptide Patterns of BALF Determined by Mass Spectrometry , 2011, PloS one.

[5]  J. Yates,et al.  A model for random sampling and estimation of relative protein abundance in shotgun proteomics. , 2004, Analytical chemistry.

[6]  Ruedi Aebersold,et al.  High throughput protein characterization by automated reverse‐phase chromatography/electrospray tandem mass spectrometry , 1998, Protein science : a publication of the Protein Society.

[7]  Kai Chen,et al.  Ovarian cancer classification based on dimensionality reduction for SELDI-TOF data , 2010, BMC Bioinformatics.

[8]  Guanghui Wang,et al.  Decoy methods for assessing false positives and false discovery rates in shotgun proteomics. , 2009, Analytical chemistry.

[9]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[10]  Magnus Palmblad,et al.  Mass spectrometry in clinical proteomics – from the present to the future , 2008, Proteomics. Clinical applications.

[11]  Elena Strocchi,et al.  Serum albumin-bound proteomic signature for early detection and staging of hepatocarcinoma: sample variability and data classification , 2010, Clinical chemistry and laboratory medicine.

[12]  Simone Daminelli,et al.  Bioinformatics Tools for Mass Spectrometry-based Proteomics Analysis , 2010 .

[13]  Michelle L. Reyzer,et al.  Gastric cancer-specific protein profile identified using endoscopic biopsy samples via MALDI mass spectrometry. , 2010, Journal of proteome research.

[14]  Shu Zheng,et al.  Detection and identification of potential biomarkers of breast cancer , 2010, Journal of Cancer Research and Clinical Oncology.

[15]  A. Scarpa,et al.  Identification of proteins released by pancreatic cancer cells by multidimensional protein identification technology: a strategy for identification of novel cancer markers , 2005, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[16]  Fang-Xiang Wu,et al.  SVM-RFE based feature selection for tandem mass spectrum quality assessment , 2011, Int. J. Data Min. Bioinform..

[17]  Li Jin,et al.  A Classification Method Based on Principal Components of SELDI Spectra to Diagnose of Lung Adenocarcinoma , 2012, PloS one.

[18]  Sung Kyu Park,et al.  A quantitative analysis software tool for mass spectrometry–based proteomics , 2008, Nature Methods.

[19]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[20]  F. James Rohlf,et al.  Biometry: The Principles and Practice of Statistics in Biological Research , 1969 .

[21]  Krzysztof Borowiak,et al.  Proteomic patterns analysis with multivariate calculations as a promising tool for prompt differentiation of early stage lung tissue with cancer and unchanged tissue material , 2011, Diagnostic pathology.

[22]  Pierluigi Mauri,et al.  Analysis of the Escherichia coli RNA degradosome composition by a proteomic approach. , 2006, Biochimie.

[23]  S. Gammeltoft,et al.  Candidate biomarker verification: Critical examination of a serum protein pattern for human colorectal cancer , 2012, Proteomics. Clinical applications.

[24]  N. Samatova,et al.  Detecting differential and correlated protein expression in label-free shotgun proteomics. , 2006, Journal of proteome research.

[25]  Igor Jurisica,et al.  Identification of pathways associated with invasive behavior by ovarian cancer cells using multidimensional protein identification technology (MudPIT). , 2008, Molecular bioSystems.

[26]  J. E. Jackson A User's Guide to Principal Components , 1991 .

[27]  Stefano Ferrero,et al.  Serum biomarkers of renal cell carcinoma assessed using a protein profiling approach based on ClinProt technique. , 2010, Urology.

[28]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[29]  Giancarlo Mauri,et al.  Mutual Information Optimization for Mass Spectra Data Alignment , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[30]  Ludovic C. Gillet,et al.  Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis* , 2012, Molecular & Cellular Proteomics.

[31]  Bobbie-Jo M Webb-Robertson Support vector machines for improved peptide identification from tandem mass spectrometry database search. , 2009, Methods in molecular biology.

[32]  O. Kvalheim,et al.  Pretreatment of mass spectral profiles: application to proteomic data. , 2007, Analytical chemistry.

[33]  Roman Kaliszan,et al.  Predictions of peptides' retention times in reversed‐phase liquid chromatography as a new supportive tool to improve protein identification in proteomics , 2009, Proteomics.

[34]  O. Kvalheim,et al.  A multivariate approach to reveal biomarker signatures for disease classification: application to mass spectral profiles of cerebrospinal fluid from patients with multiple sclerosis. , 2010, Journal of proteome research.

[35]  J. Edward Jackson,et al.  A User's Guide to Principal Components: Jackson/User's Guide to Principal Components , 2004 .

[36]  Gennifer E. Merrihew,et al.  Deconvolution of mixture spectra from ion-trap data-independent-acquisition tandem mass spectrometry. , 2010, Analytical chemistry.

[37]  Bart De Moor,et al.  Proteomic biomarkers predicting lymph node involvement in serum of cervical cancer patients. Limitations of SELDI-TOF MS , 2012, Proteome Science.

[38]  J. Xuan,et al.  Classification algorithms for phenotype prediction in genomics and proteomics. , 2008, Frontiers in bioscience : a journal and virtual library.

[39]  Robert Stevens,et al.  Combining RapidMiner Operators with Bioinformatics Services - A Powerful Combination , 2011 .

[40]  Giampaolo Merlini,et al.  Reliable typing of systemic amyloidoses through proteomic analysis of subcutaneous adipose tissue. , 2012, Blood.

[41]  B. Domon,et al.  Targeted Proteomic Quantification on Quadrupole-Orbitrap Mass Spectrometer* , 2012, Molecular & Cellular Proteomics.

[42]  Pierluigi Mauri,et al.  MudPIT analysis of released proteins in Pseudomonas aeruginosa laboratory and clinical strains in relation to pro-inflammatory effects. , 2012, Integrative biology : quantitative biosciences from nano to macro.

[43]  Markus Müller,et al.  Isoelectric point optimization using peptide descriptors and support vector machines. , 2012, Journal of proteomics.

[44]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[45]  Tony J. Parker,et al.  A Comparison of Methods for Classifying Clinical Samples Based on Proteomics Data: A Case Study for Statistical and Machine Learning Approaches , 2011, PloS one.

[46]  Michaela Scigelova,et al.  Multidimensional protein identification technology for clinical proteomic analysis , 2009, Clinical chemistry and laboratory medicine.

[47]  O. Mayboroda,et al.  The feasibility of MS and advanced data processing for monitoring Schistosoma mansoni infection , 2010, Proteomics. Clinical applications.

[48]  Sokal Rr,et al.  Biometry: the principles and practice of statistics in biological research 2nd edition. , 1981 .

[49]  Pierluigi Mauri,et al.  A proteomic approach to the analysis of RNA degradosome composition in Escherichia coli. , 2008, Methods in enzymology.

[50]  Vincenzo Lionetti,et al.  Placental stem cells pre-treated with a hyaluronan mixed ester of butyric and retinoic acid to cure infarcted pig hearts: a multimodal study. , 2011, Cardiovascular research.

[51]  John R Yates,et al.  Proteomics by mass spectrometry: approaches, advances, and applications. , 2009, Annual review of biomedical engineering.

[52]  Matej Oresic,et al.  MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data , 2010, BMC Bioinformatics.