Computational diagnosis and risk evaluation for canine lymphoma

The canine lymphoma blood test detects the levels of two biomarkers, the acute phase proteins (C-Reactive Protein and Haptoglobin). This test can be used for diagnostics, for screening, and for remission monitoring as well. We analyze clinical data, test various machine learning methods and select the best approach to these oblems. Three families of methods, decision trees, kNN (including advanced and adaptive kNN) and probability density evaluation with radial basis functions, are used for classification and risk estimation. Several pre-processing approaches were implemented and compared. The best of them are used to create the diagnostic system. For the differential diagnosis the best solution gives the sensitivity and specificity of 83.5% and 77%, respectively (using three input features, CRP, Haptoglobin and standard clinical symptom). For the screening task, the decision tree method provides the best result, with sensitivity and specificity of 81.4% and >99%, respectively (using the same input features). If the clinical symptoms (Lymphadenopathy) are considered as unknown then a decision tree with CRP and Hapt only provides sensitivity 69% and specificity 83.5%. The lymphoma risk evaluation problem is formulated and solved. The best models are selected as the system for computational lymphoma diagnosis and evaluation of the risk of lymphoma as well. These methods are implemented into a special web-accessed software and are applied to the problem of monitoring dogs with lymphoma after treatment. It detects recurrence of lymphoma up to two months prior to the appearance of clinical signs. The risk map visualization provides a friendly tool for exploratory data analysis.

[1]  Alexander G. Gray,et al.  Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines , 2009, BMC Bioinformatics.

[2]  G. Li,et al.  An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers , 2002, Bioinform..

[3]  D. McFadden Conditional logit analysis of qualitative choice behavior , 1972 .

[4]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[5]  Hong Tang,et al.  Data mining techniques for cancer detection using serum proteomic profiling , 2004, Artif. Intell. Medicine.

[6]  Martin D. Buhmann,et al.  Radial Basis Functions , 2021, Encyclopedia of Mathematical Geosciences.

[7]  J. Cerón,et al.  Preliminary studies of serum acute-phase protein concentrations in hematologic and neoplastic diseases of the dog. , 2005, Journal of veterinary internal medicine.

[8]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[9]  C. Cray,et al.  Acute Phase Proteins in Animals , 2011, Progress in Molecular Biology and Translational Science.

[10]  Bruce Randall Donald,et al.  Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum , 2003, J. Comput. Biol..

[11]  Ian O. Ellis,et al.  Current Developments in the Analysis of Proteomic Data: Artificial Neural Network Data Mining Techniques for the Identification of Proteomic Biomarkers Related to Breast Cancer , 2005 .

[12]  Pierre Geurts,et al.  Proteomic mass spectra classification using decision tree based ensemble methods , 2005, Bioinform..

[13]  Alexander N. Gorban,et al.  Principal Manifolds and Graphs in Practice: from Molecular Biology to Dynamical Systems , 2010, Int. J. Neural Syst..

[14]  Alexandre Merlo,et al.  Serum C-reactive protein concentrations in dogs with multicentric lymphoma undergoing chemotherapy. , 2007, Journal of the American Veterinary Medical Association.

[15]  Kai Chen,et al.  Ovarian cancer classification based on dimensionality reduction for SELDI-TOF data , 2010, BMC Bioinformatics.

[16]  J. Ross Quinlan,et al.  Simplifying decision trees , 1987, Int. J. Hum. Comput. Stud..

[17]  F. Gianfelici,et al.  Nearest-Neighbor Methods in Learning and Vision (Shakhnarovich, G. et al., Eds.; 2006) [Book review] , 2008 .

[18]  Russ Wolfinger,et al.  Generalizable mass spectrometry mining used to identify disease state biomarkers from blood serum , 2003, Proteomics.

[19]  Ali Mobasheri,et al.  Biomarkers in veterinary medicine: Towards targeted, individualised therapies for companion animals. , 2010, Veterinary journal.

[20]  Kit S. Lam,et al.  A Serum Glycomics Approach to Breast Cancer Biomarkers*S , 2007, Molecular & Cellular Proteomics.

[21]  E. Morello,et al.  VEGF and MMP-9: biomarkers for canine lymphoma. , 2014, Veterinary and comparative oncology.

[22]  Zili Zhang,et al.  A clustering based hybrid system for biomarker selection and sample classification of mass spectrometry data , 2010, Neurocomputing.

[23]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[24]  Tamar Frankel [The theory and the practice...]. , 2001, Tijdschrift voor diergeneeskunde.

[25]  J. Ji,et al.  Diagnosis of gastric cancer using decision tree classification of mass spectral data , 2007, Cancer Science.

[26]  Richard M Hoffman,et al.  Prostate-specific antigen testing accuracy in community practice , 2002, BMC family practice.

[27]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[28]  Melanie Hilario,et al.  Mining mass spectra for diagnosis and biomarker discovery of cerebral accidents , 2004, Proteomics.

[29]  Thomas G. Dietterich,et al.  Applying the Waek Learning Framework to Understand and Improve C4.5 , 1996, ICML.

[30]  Edward J. Delp,et al.  An iterative growing and pruning algorithm for classification tree design , 1989, Conference Proceedings., IEEE International Conference on Systems, Man and Cybernetics.

[31]  Daniel Raftery,et al.  Early detection of recurrent breast cancer using metabolite profiling. , 2010, Cancer research.

[32]  Jeffrey S. Morris,et al.  A comprehensive approach to the analysis of matrix‐assisted laser desorption/ionization‐time of flight proteomics spectra from serum samples , 2003, Proteomics.

[33]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Asit P. Basu,et al.  Probabilistic Risk Analysis , 2002 .

[35]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[36]  Georgia D. Tourassi,et al.  Data mining in proteomic mass spectrometry , 2006, Clinical Proteomics.

[37]  Younghoon Kim,et al.  Combining tissue transcriptomics and urine metabolomics for breast cancer biomarker identification , 2009, Bioinform..

[38]  K. Kozak,et al.  Identification of biomarkers for ovarian cancer using strong anion-exchange ProteinChips: Potential use in diagnosis and prognosis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Ali Mobasheri,et al.  Exploring the serum proteome in dogs: setting the scene for the discovery of new biomarkers in canine lymphoma. , 2013, Veterinary journal.

[40]  Stephen A. Williams,et al.  Unlocking Biomarker Discovery: Large Scale Application of Aptamer Proteomic Technology for Early Detection of Lung Cancer , 2010, PloS one.

[41]  Rainer Breitling,et al.  What is Systems Biology? , 2010, Front. Physiology.

[42]  Weijian Guo,et al.  Prediction of Pancreatic Cancer by Serum Biomarkers Using Surface-Enhanced Laser Desorption/Ionization-Based Decision Tree Classification , 2005, Oncology.

[43]  J. Swets,et al.  A decision-making theory of visual detection. , 1954, Psychological review.

[44]  Melanie Hilario,et al.  Data mining for mass-spectra based diagnosis and biomarker discovery , 2004, Drug Discovery Today: BIOSILICO.

[45]  Alex Pothen,et al.  Computational protein biomarker prediction: a case study for prostate cancer , 2004, BMC Bioinformatics.

[46]  Ruedi Aebersold,et al.  Cancer genetics-guided discovery of serum biomarker signatures for diagnosis and prognosis of prostate cancer , 2011, Proceedings of the National Academy of Sciences.

[47]  Brian Mooney,et al.  Proteomics of Canine Lymphoma Identifies Potential Cancer-Specific Protein Markers , 2007, Clinical Cancer Research.

[48]  Lisa H. Cazares,et al.  Surfaced-Enhanced Laser Desorption/Ionization Time-of-Flight (SELDI-TOF) Differentiation of Serum Protein Profiles of BRCA-1 and Sporadic Breast Cancer , 2004, Annals of Surgical Oncology.

[49]  Yinhua Yu,et al.  Potential markers that complement expression of CA125 in epithelial ovarian cancer. , 2005, Gynecologic oncology.

[50]  Christophe Lemetre,et al.  An introduction to artificial neural networks in bioinformatics - application to complex microarray and mass spectrometry datasets in cancer studies , 2008, Briefings Bioinform..

[51]  David Ward,et al.  Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data , 2003, Bioinform..

[52]  K. Clarkson Nearest-Neighbor Searching and Metric Space Dimensions , 2005 .

[53]  B. Efron Regression and ANOVA with Zero-One Data: Measures of Residual Variation , 1978 .

[54]  Martin D. Buhmann,et al.  Radial Basis Functions: Theory and Implementations: Preface , 2003 .

[55]  Mia K. Markey,et al.  A machine learning perspective on the development of clinical decision support systems utilizing mass spectra of blood samples , 2006, J. Biomed. Informatics.

[56]  Melanie Hilario,et al.  Approaches to dimensionality reduction in proteomic biomarker studies , 2007, Briefings Bioinform..

[57]  Foster J. Provost,et al.  Handling Missing Values when Applying Classification Models , 2007, J. Mach. Learn. Res..

[58]  K. Chalvet-Monfray,et al.  Genetic and environmental risk indicators in canine non-Hodgkin's lymphomas: breed associations and geographic distribution of 608 cases diagnosed throughout France over 1 year. , 2009, Journal of veterinary internal medicine.

[59]  Scott J Walmsley,et al.  Identification of serum biomarkers for canine B-cell lymphoma by use of surface-enhanced laser desorption-ionization time-of-flight mass spectrometry. , 2007, American journal of veterinary research.

[60]  Sayan Mukherjee,et al.  Do serum biomarkers really measure breast cancer , 2009 .

[61]  G. Ogilvie,et al.  Concentration of alpha 1-acid glycoprotein in dogs with malignant neoplasia. , 1993, Journal of the American Veterinary Medical Association.

[62]  Yishay Mansour,et al.  On the boosting ability of top-down decision tree learning algorithms , 1996, STOC '96.

[63]  Hao Jiang,et al.  Multiple approaches to data‐mining of proteomic data based on statistical and pattern classification methods , 2003, Proteomics.

[64]  P. Eckersall,et al.  Changes in C-reactive protein and haptoglobin in dogs with lymphatic neoplasia. , 2007, Veterinary journal.

[65]  H. Haick,et al.  Detection of lung, breast, colorectal, and prostate cancers from exhaled breath using a single array of nanosensors , 2010, British Journal of Cancer.

[66]  C. Floyd,et al.  Decision tree classification of proteins identified by mass spectrometry of blood serum samples from people with and without lung cancer , 2003, Proteomics.

[67]  M. Lishner,et al.  Detection of relapse in non‐Hodgkin's lymphoma: Role of routine follow‐up studies , 2002, American journal of hematology.

[68]  A. Pothen,et al.  Protocols for disease classification from mass spectrometry data , 2003, Proteomics.

[69]  A. Mobasheri,et al.  Proteomic identification and profiling of canine lymphoma patients. , 2009, Veterinary and comparative oncology.

[70]  Vili Podgorelec,et al.  Decision trees , 2018, Encyclopedia of Database Systems.

[71]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[72]  Stefano Comazzi,et al.  The dog as a possible animal model for human non‐Hodgkin lymphoma: a review , 2013, Hematological oncology.

[73]  Melanie Hilario,et al.  Machine learning approaches to lung cancer prediction from mass spectra , 2003, Proteomics.

[74]  Uliano Morandi,et al.  Enriched sera protein profiling for detection of non-small cell lung cancer biomarkers , 2011, Proteome Science.

[75]  Alexander N. Gorban,et al.  Computational diagnosis of canine lymphoma , 2013, Journal of Physics: Conference Series.

[76]  Qi Li,et al.  Nonparametric Econometrics: Theory and Practice , 2006 .

[77]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[78]  Huiqing Liu,et al.  Discovery of significant rules for classifying cancer diagnosis data , 2003, ECCB.

[79]  Joanna S Morris,et al.  Changes in the serum proteome of canine lymphoma identified by electrophoresis and mass spectrometry. , 2013, Veterinary journal.

[81]  J. Potter,et al.  A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. , 2003, Biostatistics.

[82]  Eugenij Moiseevich Mirkes,et al.  Data complexity measured by principal graphs , 2013, Comput. Math. Appl..

[83]  K P Freeman,et al.  Serum alpha 1-acid glycoprotein concentrations before and after relapse in dogs with lymphoma treated with doxorubicin. , 1999, Journal of the American Veterinary Medical Association.

[84]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[85]  H. Erb,et al.  Comparison of 3 protocols for treatment after induction of remission in dogs with lymphoma. , 2007, Journal of veterinary internal medicine.

[86]  D. Chan,et al.  Serum Diagnosis of Pancreatic Adenocarcinoma Using Surface-Enhanced Laser Desorption and Ionization Mass Spectrometry , 2004, Clinical Cancer Research.

[87]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[88]  C. Henry,et al.  Biomarkers in veterinary cancer screening: Applications, limitations and expectations. , 2010, Veterinary journal.

[89]  J. Miró,et al.  COMPUTATIONAL SYSTEMS BIOLOGY OF CANCER , 2014 .

[90]  Pierre-Antoine Absil,et al.  Principal Manifolds for Data Visualization and Dimension Reduction , 2007 .

[91]  Hans V. Westerhoff,et al.  What is systems biology? From genes to function and back. , 2005 .

[92]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[93]  G. Ball,et al.  Serum biomarker profiling in cancer studies: a question of standardisation? , 2008, Veterinary and comparative oncology.

[94]  P. Schellhammer,et al.  Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. , 2002, Cancer research.