Wavelet Feature Extraction and Genetic Algorithm for Biomarker Detection in Colorectal Cancer Data

Biomarkers which predict patient's survival play an important role in medical diagnosis and treatment. How to select the significant biomarkers from hundreds of protein markers is a key step in survival analysis. In this paper a novel method is proposed to detect the prognostic biomarkers of survival in colorectal cancer patients using wavelet analysis, genetic algorithm, and Bayes classifier. One dimensional discrete wavelet transform (DWT) is normally used to reduce the dimensionality of biomedical data. In this study one dimensional continuous wavelet transform (CWT) was proposed to extract the features of colorectal cancer data. One dimensional CWT has no ability to reduce dimensionality of data, but captures the missing features of DWT, and is complementary part of DWT. Genetic algorithm was performed on extracted wavelet coefficients to select the optimized features, using Bayes classifier to build its fitness function. The corresponding protein markers were located based on the position of optimized features. Kaplan-Meier curve and Cox regression model were used to evaluate the performance of selected biomarkers. Experiments were conducted on colorectal cancer dataset and several significant biomarkers were detected. A new protein biomarker CD46 was found to be significantly associated with survival time.

[1]  Ivan Bratko,et al.  Machine Learning for Survival Analysis: A Case Study on Recurrence of Prostate Cancer , 1999, AIMDM.

[2]  Z. Fishelson,et al.  Obstacles to cancer immunotherapy: expression of membrane complement regulatory proteins (mCRPs) in tumors. , 2003, Molecular immunology.

[3]  T. Takahashi,et al.  Expression of complement regulating factors in gastric cancer cells , 2002, Molecular pathology : MP.

[4]  Yihui Liu,et al.  Prominent feature selection of microarray data , 2009 .

[5]  R. Broll,et al.  p53 and Bcl-2 as significant predictors of recurrence and survival in rectal cancer. , 2000, European journal of cancer.

[6]  M. Ilyas,et al.  Intratumoral T cell infiltration, MHC class I and STAT1 as biomarkers of good prognosis in colorectal cancer , 2010, Gut.

[7]  R. Johnstone,et al.  Identification and quantification of complement regulator CD46 on normal human tissues. , 1993, Immunology.

[8]  David P. Williamson,et al.  The Reorientation of T-Cell Polarity and Inhibition of Immunological Synapse Formation by CD46 Involves Its Recruitment to Lipid Rafts , 2011, Journal of lipids.

[9]  Jiye Jin,et al.  Wavelet Derivative: Application in Multicomponent Analysis of Electrochemical Signals , 2004 .

[10]  David W. Hosmer,et al.  Applied Survival Analysis: Regression Modeling of Time-to-Event Data , 2008 .

[11]  Alexander Kai-man Leung,et al.  Wavelet Transform: A Method for Derivative Calculation in Analytical Chemistry , 1998 .

[12]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  R. J. Nicholson,et al.  Introductory Mathematical Statistics. , 1971 .

[14]  Andrzej Skowron,et al.  Rough set methods in feature selection and recognition , 2003, Pattern Recognit. Lett..

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  D. Cox,et al.  Analysis of Survival Data. , 1985 .

[17]  C. Kemper,et al.  CD46 in innate and adaptive immunity: an update , 2011, Clinical and experimental immunology.

[18]  Neal O. Jeffries,et al.  Performance of a genetic algorithm for mass spectrometry proteomics , 2004, BMC Bioinformatics.

[19]  A. Dalmasso,et al.  Human carcinomas variably express the complement inhibitory proteins CD46 (membrane cofactor protein), CD55 (decay-accelerating factor), and CD59 (protectin). , 1996, The American journal of pathology.

[20]  Elia Biganzoli,et al.  Selection of artificial neural network models for survival analysis with Genetic Algorithms , 2007, Comput. Stat. Data Anal..

[21]  J. Oliaro,et al.  CD46 signaling in T cells: Linking pathogens with polarity , 2010, FEBS letters.

[22]  Dorothee P. Auer,et al.  Classificatioo of MR Tumor Images Based on Gabor Wavelet Analysis , 2012 .

[23]  Roberto Tagliaferri,et al.  A novel neural network-based survival analysis model , 2003, Neural Networks.

[24]  Li Lei,et al.  Palmprint verification based on 2D - Gabor wavelet and pulse-coupled neural network , 2012, Knowl. Based Syst..

[25]  E L Kaplan NON-PARAMETRIC ESTIMATION FROM INCOMPLETE OBSERVATION , 1958 .

[26]  S. Russell,et al.  CD46: a complement regulator and pathogen receptor that mediates links between innate and acquired immune function. , 2004, Tissue antigens.

[27]  A. Mukherjee,et al.  Overexpression of FLIPL Is an Independent Marker of Poor Prognosis in Colorectal Cancer Patients , 2007, Clinical Cancer Research.

[28]  J. Atkinson,et al.  Purification and functional properties of soluble forms of membrane cofactor protein (CD46) of complement: identification of forms increased in cancer patients' sera. , 1995, International immunology.

[29]  Yihui Liu,et al.  Feature extraction and dimensionality reduction for mass spectrometry data , 2009, Comput. Biol. Medicine.

[30]  K. Koretz,et al.  Expression of CD59, a complement regulator protein and a second ligand of the CD2 molecule, and CD46 in normal and neoplastic colorectal epithelium. , 1993, British Journal of Cancer.

[31]  Lance A Liotta,et al.  Genomics and proteomics: application of novel technology to early detection and prevention of cancer. , 2002, Cancer detection and prevention.

[32]  Yihui Liu,et al.  Wavelet feature extraction for high-dimensional microarray data , 2009, Neurocomputing.

[33]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[34]  Zbigniew Michalewicz,et al.  An Experimental Comparison of Binary and Floating Point Representations in Genetic Algorithms , 1991, ICGA.

[35]  Li Bai,et al.  Find Significant Gene Information Based on Changing Points of Microarray Data , 2009, IEEE Transactions on Biomedical Engineering.

[36]  Thomas M. Cover,et al.  The Best Two Independent Measurements Are Not the Two Best , 1974, IEEE Trans. Syst. Man Cybern..

[37]  P. Hofman,et al.  High expression of the antigen recognized by the monoclonal antibody GB24 on human breast carcinomas: A preventive mechanism of malignant tumor cells against complement attack? , 2004, Breast Cancer Research and Treatment.

[38]  Jose Miguel Puerta,et al.  Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking , 2012, Knowl. Based Syst..

[39]  B. Loveland,et al.  Identification and quantification of complement regulator CD46 on normal human tissues. , 1993, Immunology.

[40]  Leslie S. Smith,et al.  Feature subset selection in large dimensionality domains , 2010, Pattern Recognit..

[41]  Bojana Dalbelo Basic,et al.  Impact of censoring on learning Bayesian networks in survival modelling , 2009, Artif. Intell. Medicine.

[42]  A. Astier,et al.  CD46 processing: a means of expression. , 2012, Immunobiology.

[43]  Zengyou He,et al.  G-ANMI: A mutual information based genetic clustering algorithm for categorical data , 2010, Knowl. Based Syst..

[44]  C. Burrus,et al.  Introduction to Wavelets and Wavelet Transforms: A Primer , 1997 .

[45]  Bing Huang,et al.  Dominance-based rough set model in intuitionistic fuzzy information systems , 2012, Knowl. Based Syst..

[46]  Ronald H. Rnndles Nonparametric Statistical Inference (2nd ed.) , 1986 .

[47]  Sergios Theodoridis,et al.  Pattern Recognition, Third Edition , 2006 .

[48]  T. K. Radhakrishnan,et al.  Real-coded genetic algorithm for system identification and controller tuning , 2009 .

[49]  L. Li,et al.  CD55 is over-expressed in the tumour environment , 2001, British Journal of Cancer.

[50]  L. Galluzzi,et al.  Inhibition of Chk1 Kills Tetraploid Tumor Cells through a p53-Dependent Pathway , 2007, PloS one.

[51]  Degang Chen,et al.  Fuzzy rough set based attribute reduction for information systems with fuzzy decisions , 2011, Knowl. Based Syst..

[52]  Yihui Liu,et al.  Dimensionality reduction and main component extraction of mass spectrometry cancer data , 2012, Knowl. Based Syst..

[53]  Huiqing Liu,et al.  Discovery of significant rules for classifying cancer diagnosis data , 2003, ECCB.

[54]  Yihui Liu,et al.  Detect Key Gene Information in Classification of Microarray Data , 2008, EURASIP J. Adv. Signal Process..

[55]  D. Cox,et al.  Analysis of Survival Data. , 1986 .

[56]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[57]  Jean Dickinson Gibbons,et al.  Nonparametric Statistical Inference , 1972, International Encyclopedia of Statistical Science.

[58]  L. Thorsteinsson,et al.  The complement regulatory proteins CD46 and CD59, but not CD55, are highly expressed by glandular epithelium of human breast and colorectal tumour tissues , 1998, APMIS : acta pathologica, microbiologica, et immunologica Scandinavica.

[59]  Lei Nie,et al.  Approximate Derivative Calculated by Using Continuous Wavelet Transform , 2002, J. Chem. Inf. Comput. Sci..

[60]  Alden H. Wright,et al.  Genetic Algorithms for Real Parameter Optimization , 1990, FOGA.