Multiprobabilistic prediction in early medical diagnoses

This paper describes the methodology of providing multiprobability predictions for proteomic mass spectrometry data. The methodology is based on a newly developed machine learning framework called Venn machines. Is allows to output a valid probability interval. The methodology is designed for mass spectrometry data. For demonstrative purposes, we applied this methodology to MALDI-TOF data sets in order to predict the diagnosis of heart disease and early diagnoses of ovarian cancer and breast cancer. The experiments showed that probability intervals are narrow, that is, the output of the multiprobability predictor is similar to a single probability distribution. In addition, probability intervals produced for heart disease and ovarian cancer data were more accurate than the output of corresponding probability predictor. When Venn machines were forced to make point predictions, the accuracy of such predictions is for the most data better than the accuracy of the underlying algorithm that outputs single probability distribution of a label. Application of this methodology to MALDI-TOF data sets empirically demonstrates the validity. The accuracy of the proposed method on ovarian cancer data rises from 66.7 % 11 months in advance of the moment of diagnosis to up to 90.2 % at the moment of diagnosis. The same approach has been applied to heart disease data without time dependency, although the achieved accuracy was not as high (up to 69.9 %). The methodology allowed us to confirm mass spectrometry peaks previously identified as carrying statistically significant information for discrimination between controls and cases.

[1]  Ilia Nouretdinov,et al.  Early detection of ovarian cancer in samples pre-diagnosis using CA125 and MALDI-MS peaks. , 2011, Cancer genomics & proteomics.

[2]  J. Stuart PROCEEDINGS - PART II , 1993 .

[3]  Alexander Gammerman,et al.  Serum Proteomic Abnormality Predating Screen Detection of Ovarian Cancer , 2009, Comput. J..

[4]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[5]  Erhard Tornier,et al.  Grundlagen der Wahrscheinlichkeitsrechnung , 1933 .

[6]  Harris Papadopoulos,et al.  Reliable Probabilistic Prediction for Medical Decision Support , 2011, EANN/AIAI.

[7]  W. Gasarch,et al.  The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .

[8]  R. Mises,et al.  Wahrscheinlichkeit, Statistik und Wahrheit. , 1936 .

[9]  Alexander Gammerman,et al.  A Comparison of Venn Machine with Platt's Method in Probabilistic Outputs , 2011, EANN/AIAI.

[10]  Ilia Nouretdinov,et al.  Peptides generated ex vivo from serum proteins by tumor-specific exopeptidases are not useful biomarkers in ovarian cancer. , 2010, Clinical chemistry.

[11]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[12]  Wh Sit,et al.  Cancer Genomics & Proteomics , 2007 .

[13]  Alexander Gammerman,et al.  CLRC—TR—08—02 Analysis of serial UKCTOCS-OC data: discriminating abilities of proteomics peaks , 2008 .

[14]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[15]  Harris Papadopoulos,et al.  Reliable Probability Estimates Based on Support Vector Machines for Large Multiclass Datasets , 2012, AIAI.

[16]  Dimitry Devetyarov,et al.  Confidence and venn machines and their applications to proteomics , 2010 .

[17]  Vladimir Vovk,et al.  Self-calibrating Probability Forecasting , 2003, NIPS.

[18]  G. Doetsch RICHARD V. MISES, Professor an der Universität Berlin, Wahrscheinlichkeit, Statistik und Wahrheit. Schriften zur wissenschaftlichen Weltauffassung Bd. 3. Verlag von Julius Springer, Wien 1928. VII + 189 S. Preis 9,60 M . , 2022 .

[19]  Kathleen Ruff,et al.  ON BEHALF OF THE , 2000 .

[20]  R. Mises Grundlagen der Wahrscheinlichkeitsrechnung , 1919 .