Classification of tumor marker values using heuristic data mining methods

Tumor markers are substances that are found in blood, urine, or body tissues and that are used as indicators for tumors; elevated tumor marker values can indicate the presence of cancer, but there can also be other causes. We have used a medical database compiled at the blood laboratory of the General Hospital Linz, Austria: Several blood values of thousands of patients are available as well as several tumor markers. We have used several data based modeling approaches for identifying mathematical models for estimating selected tumor marker values on the basis of routinely available blood values; in detail, estimators for the tumor markers AFP, CA-125, CA15-3, CEA, CYFRA, and PSA have been identified and are analyzed in this paper. The documented tumor marker values are classified as "normal" or "elevated"; our goal is to design classifiers for the respective binary classification problems. As we show in the results section, for those medical modeling tasks described here, genetic programming performs best among those techniques that are able to identify nonlinearities; we also see that GP results show less overfitting than those produced using other methods.

[1]  Phil Gold,et al.  DEMONSTRATION OF TUMOR-SPECIFIC ANTIGENS IN HUMAN COLONIC CARCINOMATA BY IMMUNOLOGICAL TOLERANCE AND ABSORPTION TECHNIQUES , 1965, The Journal of experimental medicine.

[2]  G. Mizejewski,et al.  Alpha-fetoprotein Structure and Function: Relevance to Isoforms, Epitopes, and Conformational Variants , 2001, Experimental biology and medicine.

[3]  M. Affenzeller,et al.  Offspring Selection: A New Self-Adaptive Selection Scheme for Genetic Algorithms , 2005 .

[4]  P. Lee,et al.  Evaluation of cytokeratin 19 fragment (CYFRA 21-1) as a tumor marker in malignant pleural effusion. , 1999, Japanese journal of clinical oncology.

[5]  Stephan M. Winkler,et al.  Goal-oriented preservation of essential genetic information by offspring selection , 2005, GECCO '05.

[6]  Joachim Schneider,et al.  Cut-off-independent tumour marker evaluation using ROC approximation. , 2007, Anticancer research.

[7]  Yasuhiro Fujiwara,et al.  Tumor-marker analysis and verification of prognostic models in patients with cancer of unknown primary, receiving platinum-based combination chemotherapy , 2006, Journal of Cancer Research and Clinical Oncology.

[8]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[9]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[10]  S. Hammarström The carcinoembryonic antigen (CEA) family: structures, suggested functions and expression in normal and malignant tissues. , 1999, Seminars in cancer biology.

[11]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12]  W. Banzhaf,et al.  Genetic Programming of an Algorithmic Chemistry , 2005 .

[13]  David Chia,et al.  Mortality results from a randomized prostate-cancer screening trial. , 2009, The New England journal of medicine.

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  Alex Simpkins,et al.  System Identification: Theory for the User, 2nd Edition (Ljung, L.; 1999) [On the Shelf] , 2012, IEEE Robotics & Automation Magazine.

[16]  Zhi-Yuan Zhang,et al.  [Application of serum tumor markers and support vector machine in the diagnosis of oral squamous cell carcinoma]. , 2008, Shanghai kou qiang yi xue = Shanghai journal of stomatology.

[17]  E. Fung,et al.  Proteomic approaches to tumor marker discovery. , 2002, Archives of pathology & laboratory medicine.

[18]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[19]  Katharina Morik,et al.  Erratum to "Knowledge discovery and knowledge validation in intensive care" , 2000, Artif. Intell. Medicine.

[20]  J A Koepke,et al.  Molecular marker test standardization , 1992, Cancer.

[21]  Katharina Morik,et al.  Knowledge discovery and knowledge validation in intensive care , 2000, Artif. Intell. Medicine.

[22]  Stefan Wagner,et al.  SexualGA: Gender-Specific Selection for Genetic Algorithms , 2005 .

[23]  Stephan M. Winkler,et al.  Evolutionary System Identification , 2009 .

[24]  David G. Stork,et al.  Pattern Classification , 1973 .

[25]  N Osman,et al.  Correlation of serum CA125 with stage, grade and survival of patients with epithelial ovarian cancer at a single centre. , 2008, Irish medical journal.

[26]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[27]  J. Crowley,et al.  Prevalence of prostate cancer among men with a prostate-specific antigen level < or =4.0 ng per milliliter. , 2004, The New England journal of medicine.

[28]  Michael Affenzeller,et al.  SASEGASA: A New Generic Parallel Evolutionary Algorithm for Achieving Highest Quality Results , 2004, J. Heuristics.

[29]  Philip E. Gill,et al.  Practical optimization , 1981 .

[30]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[31]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[32]  O. Nelles Nonlinear System Identification , 2001 .

[33]  H. Gray Gray's Anatomy , 1858 .

[34]  Myrna LaFleur-Brooks,et al.  Exploring medical language: A student-directed approach , 1985 .

[35]  Stephan M. Winkler,et al.  Genetic Algorithms and Genetic Programming - Modern Concepts and Practical Applications , 2009 .

[36]  Monique J. Roobol,et al.  Mortality Results from a Randomized ProstateCancer Screening Trial , 2009 .

[37]  Stephan M. Winkler,et al.  Genetic Algorithms and Genetic Programming , 2010 .

[38]  M. Duffy,et al.  A personalized approach to cancer treatment: how biomarkers can help. , 2008, Clinical chemistry.

[39]  Patrick C. Walsh,et al.  Prevalence of Prostate Cancer Among Men With a Prostate-Specific Antigen Level ≤4.0 ng per Milliliter , 2004 .

[40]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[41]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[42]  Y. Niv,et al.  MUC1 and colorectal cancer pathophysiology considerations. , 2008, World journal of gastroenterology.

[43]  B. Yin,et al.  Ovarian cancer antigen CA125 is encoded by the MUC16 mucin gene , 2002, International journal of cancer.