Title Comparison of different statistical approaches for urinary peptide biomarker detection in the context of coronary artery disease

Background: When combined with a clinical outcome variable, the size, complexity and nature of mass-spectrometry proteomics data impose great statistical challenges in the discovery of potential disease-associated biomarkers. The purpose of this study was thus to evaluate the effectiveness of different statistical methods applied for urinary proteomic biomarker discovery and different methods of classifier modelling in respect of the diagnosis of coronary artery disease in 197 study subjects and the prognostication of acute coronary syndromes in 368 study subjects. Results: Computing the discovery sub-cohorts comprising 2 3 = of the study subjects based on the Wilcoxon rank sum test, t-score, cat-score, binary discriminant analysis and random forests provided largely different numbers (ranging from 2 to 398) of potential peptide biomarkers. Moreover, these biomarker patterns showed very little overlap limited to fragments of type I and III collagens as the common denominator. However, these differences in biomarker patterns did mostly not translate into significant differently performing diagnostic or prognostic classifiers modelled by support vector machine, diagonal discriminant analysis, linear discriminant analysis, binary discriminant analysis and random forest. This was even true when different biomarker patterns were combined into master-patterns. Conclusion: In conclusion, our study revealed a very considerable dependence of peptide biomarker discovery on statistical computing of urinary peptide profiles while the observed diagnostic and/or prognostic reliability of classifiers was widely independent of the modelling approach. This may however be due to the limited statistical power in classifier testing. Nonetheless, our study showed that urinary proteome analysis has the potential to provide valuable biomarkers for coronary artery disease mirroring especially alterations in the extracellular matrix. It further showed that for a comprehensive discovery of biomarkers and thus of pathological information, the results of different statistical methods may best be combined into a master pattern that then can be used for classifier modelling.

[1]  A. Dominiczak,et al.  Urine proteomics in the diagnosis of stable angina , 2016, BMC Cardiovascular Disorders.

[2]  A. Vlahou,et al.  CE‐MS‐based proteomics in biomarker discovery and clinical application , 2015, Proteomics. Clinical applications.

[3]  Sebastian Gibb,et al.  Differential protein expression and peak selection in mass spectrometry data by binary discriminant analysis , 2015, Bioinform..

[4]  I. Gonçalves,et al.  Collagen and related extracellular matrix proteins in atherosclerotic plaque development , 2014, Current opinion in lipidology.

[5]  E. Marengo,et al.  Biomarkers Discovery through Multivariate Statistical Methods: A Review of Recently Developed Methods and Applications in Proteomics , 2014 .

[6]  Barbara D. Smith,et al.  Extracellular matrix synthesis in vascular disease: hypertension, and atherosclerosis , 2013, Journal of biomedical research.

[7]  Mandana Rezaeiahari,et al.  AHP based Classification Algorithm Selection for Clinical Decision Support System Development , 2014, Complex Adaptive Systems.

[8]  A. Orekhov,et al.  Vascular Extracellular Matrix in Atherosclerosis , 2013, Cardiology in review.

[9]  Xiong-qing Huang,et al.  Vascular fibrosis in atherosclerosis. , 2013, Cardiovascular pathology : the official journal of the Society for Cardiovascular Pathology.

[10]  John P A Ioannidis,et al.  Technical aspects and inter-laboratory variability in native peptide profiling: the CE-MS experience. , 2013, Clinical biochemistry.

[11]  A. Dominiczak,et al.  Association of central and peripheral pulse pressure with intermediate cardiovascular phenotypes , 2012, Journal of hypertension.

[12]  Chi‐Hang Lee,et al.  Characteristics of Aortic Wall Extracellular Matrix in Patients with Acute Myocardial Infarction: Tissue Microarray Detection of Collagen I, Collagen III and Elastin Levels , 2012 .

[13]  Anne-Laure Boulesteix,et al.  AUC-RF: A New Strategy for Genomic Profiling with Random Forest , 2011, Human Heredity.

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  Alexandros Kalousis,et al.  Addressing the Challenge of Defining Valid Proteomic Biomarkers and Classifiers , 2010, BMC Bioinformatics.

[16]  A. Dominiczak,et al.  Urinary proteomic diagnosis of coronary artery disease: identification and clinical validation in 623 individuals , 2010, Journal of hypertension.

[17]  José Antonio Lozano,et al.  Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  K. Strimmer,et al.  Feature selection in omics prediction problems using cat scores and false nondiscovery rate control , 2009, 0903.2003.

[19]  Harald Mischak,et al.  Identification and Validation of Urinary Biomarkers for Differential Diagnosis and Evaluation of Therapeutic Intervention in Anti-neutrophil Cytoplasmic Antibody-associated Vasculitis* , 2009, Molecular & Cellular Proteomics.

[20]  Zengyou He,et al.  Technical, bioinformatical and statistical aspects of liquid chromatography-mass spectrometry (LC-MS) and capillary electrophoresis-mass spectrometry (CE-MS) based clinical proteomics: a critical assessment. , 2009, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[21]  Korbinian Strimmer,et al.  Gene ranking and biomarker discovery under correlation , 2009, Bioinform..

[22]  M. Bendeck,et al.  Collagens in the progression and complications of atherosclerosis , 2009, Vascular medicine.

[23]  H. Mischak,et al.  Quantitative urinary proteome analysis for biomarker evaluation in chronic kidney disease. , 2009, Journal of proteome research.

[24]  A. Dominiczak,et al.  CE‐MS analysis of the human urinary proteome for biomarker discovery and disease diagnostics , 2008, Proteomics. Clinical applications.

[25]  Harald Mischak,et al.  Urinary proteomics in diabetes and CKD. , 2008, Journal of the American Society of Nephrology : JASN.

[26]  Mark Girolami,et al.  Analysis of complex, multidimensional datasets. , 2006, Drug discovery today. Technologies.

[27]  H. Frierson,et al.  Discovery and validation of new protein biomarkers for urothelial cancer: a prospective analysis. , 2006, The Lancet. Oncology.

[28]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[29]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[30]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[31]  Walter Kolch,et al.  Discovery of biomarkers in human urine and cerebrospinal fluid by capillary electrophoresis coupled to mass spectrometry: Towards new diagnostic and therapeutic approaches , 2005, Electrophoresis.

[32]  W. Kolch,et al.  Mass spectrometry for the detection of differentially expressed proteins: a comparison of surface-enhanced laser desorption/ionization and capillary electrophoresis/mass spectrometry. , 2004, Rapid communications in mass spectrometry : RCM.

[33]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[34]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[35]  J. Shaw,et al.  The Australian Diabetes, Obesity and Lifestyle Study (AusDiab)--methods and response rates. , 2002, Diabetes research and clinical practice.

[36]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[37]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .