A Model Based on Genetic Algorithm for Colorectal Cancer Diagnosis

In this paper we present a method based on genetic algorithm capable of analyzing a significant number of features obtained from fractal techniques, Haralick texture features and curvelet coefficients, as well as several selection methods and classifiers for the study and pattern recognition of colorectal cancer. The chromosomal structure was represented by four genes in order to define an individual. The steps for evaluation and selection of individuals as well as crossover and mutation were directed to provide distinctions of colorectal cancer groups with the highest accuracy rate and the smallest number of features. The tests were performed with features from histological images H&E, different values of population and iterations numbers and with the k-fold cross-validation method. The best result was provided by a population of 500 individuals and 50 iterations applying relief, random forest and 29 features (obtained mainly from the combination of percolation measures and curvelet subimages). This solution was capable of distinguishing the groups with an accuracy rate of 90.82% and an AUC equal to 0.967.

[1]  Jianhua Dai,et al.  Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification , 2013, Appl. Soft Comput..

[2]  Francesco Bianconi,et al.  Multi-class texture analysis in colorectal cancer histology , 2016, Scientific Reports.

[3]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[4]  N. S. Nikolaidis,et al.  A Variation of the Box-Counting Algorithm Applied to Colour Images , 2011, ArXiv.

[5]  Berkman Sahiner,et al.  Lung nodule detection on thoracic computed tomography images: preliminary evaluation of a computer-aided diagnosis system. , 2002, Medical physics.

[6]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[7]  Lasse Riis Østergaard,et al.  Using cell nuclei features to detect colon cancer tissue in hematoxylin and eosin stained slides , 2017, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[8]  Kunio Doi,et al.  Computer-aided diagnosis in medical imaging: Historical review, current status and future potential , 2007, Comput. Medical Imaging Graph..

[9]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[10]  C. Mathers,et al.  GLOBOCAN 2012 v1.0, Cancer Incidence and Mortality Worldwide: IARC CancerBase No. 11 [Internet]. Lyon, France: International Agency for Research on Cancer , 2013 .

[11]  Erhard Bruderer,et al.  Organizational Evolution, Learning, and Selection: A Genetic-Algorithm-Based Model , 1996 .

[12]  Jianping Gou,et al.  A generalized mean distance-based k-nearest neighbor classifier , 2019, Expert Syst. Appl..

[13]  Nikhil R. Pal,et al.  Genetic programming for simultaneous feature selection and classifier design , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  Joan Lu,et al.  University of Huddersfield Repository Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis Examining Applying High Performance Genetic Data Feature Selection and Classification Algorithms for Colon Cancer Diagnosis , 2022 .

[15]  Rohini K. Srihari,et al.  Feature selection for text categorization on imbalanced data , 2004, SKDD.

[16]  Ling Guan,et al.  A CAD System for the Automatic Detection of Clustered Microcalcification in Digitized Mammogram Films , 2000, IEEE Trans. Medical Imaging.

[17]  Xiaofeng Gu,et al.  An Intelligent System for Lung Cancer Diagnosis Using a New Genetic Algorithm Based Feature Selection Method , 2014, Journal of Medical Systems.

[18]  M. W Gardner,et al.  Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences , 1998 .

[19]  M Anbarasi,et al.  ENHANCED PREDICTION OF HEART DISEASE WITH FEATURE SUBSET SELECTION USING GENETIC ALGORITHM , 2010 .

[20]  Su Ruan,et al.  Feature selection for outcome prediction in oesophageal cancer using genetic algorithm and random forest classifier , 2017, Comput. Medical Imaging Graph..

[21]  Darrell Whitley,et al.  A genetic algorithm tutorial , 1994, Statistics and Computing.

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  R.M. Haralick,et al.  Statistical and structural approaches to texture , 1979, Proceedings of the IEEE.

[24]  John G. Cleary,et al.  K*: An Instance-based Learner Using and Entropic Distance Measure , 1995, ICML.

[25]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[26]  Alex Alves Freitas,et al.  A Survey of Genetic Algorithms for Multi-Label Classification , 2018, 2018 IEEE Congress on Evolutionary Computation (CEC).

[27]  Jasmin Kevric,et al.  Cloud computing-based parallel genetic algorithm for gene selection in cancer classification , 2016, Neural Computing and Applications.

[28]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[29]  Alessandro Santana Martins,et al.  Classification of colorectal cancer based on the association of multidimensional and multiresolution features , 2019, Expert Syst. Appl..

[30]  M. Ivanovici,et al.  Fractal dimension and lacunarity of psoriatic lesions: a colour approach , 2009 .

[31]  E. Candès,et al.  New tight frames of curvelets and optimal representations of objects with piecewise C2 singularities , 2004 .

[32]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[33]  Alessandro Santana Martins,et al.  Features based on the percolation theory for quantification of non-Hodgkin lymphomas , 2017, Comput. Biol. Medicine.

[34]  Samir Brahim Belhaouari,et al.  A statistical based feature extraction method for breast cancer diagnosis in digital mammogram using multiresolution representation , 2012, Comput. Biol. Medicine.