Observer-invariant histopathology using genetics-based machine learning

Prostate cancer accounts for one-third of noncutaneous cancers diagnosed in US men and is a leading cause of cancer-related death. Advances in Fourier transform infrared spectroscopic imaging now provide very large data sets describing both the structural and local chemical properties of cells within prostate tissue. Uniting spectroscopic imaging data and computer-aided diagnoses (CADx), our long term goal is to provide a new approach to pathology by automating the recognition of cancer in complex tissue. The first step toward the creation of such CADx tools requires mechanisms for automatically learning to classify tissue types—a key step on the diagnosis process. Here we demonstrate that genetics-based machine learning (GBML) can be used to approach such a problem. However, to efficiently analyze this problem there is a need to develop efficient and scalable GBML implementations that are able to process very large data sets. In this paper, we propose and validate an efficient GBML technique—$${\tt NAX}$$—based on an incremental genetics-based rule learner. $${\tt NAX}$$ exploits massive parallelisms via the message passing interface (MPI) and efficient rule-matching using hardware-implemented operations. Results demonstrate that $${\tt NAX}$$ is capable of performing prostate tissue classification efficiently, making a compelling case for using GBML implementations as efficient and powerful tools for biomedical image processing.

[1]  George Karypis,et al.  Introduction to Parallel Computing Solution Manual , 2003 .

[2]  Stewart W. Wilson Mining Oblique Data with XCS , 2000, IWLCS.

[3]  Rohit Bhargava,et al.  High throughput assessment of cells and tissues: Bayesian classification of spectral metrics from infrared vibrational spectroscopic imaging data. , 2006, Biochimica et biophysica acta.

[4]  Xavier Llorà,et al.  Knowledge-independent data mining with fine-grained parallel evolutionary algorithms , 2001 .

[5]  N. J. Radcliffe,et al.  GA-MINER: Parallel Data Mining with Hierarchical Genetic Algorithms Final Report , 1995 .

[6]  Erick Cantú-Paz,et al.  Efficient and Accurate Parallel Genetic Algorithms , 2000, Genetic Algorithms and Evolutionary Computation.

[7]  I. W. Levin,et al.  Fourier transform infrared vibrational spectroscopic imaging: integrating microscopy and molecular recognition. , 2005, Annual review of physical chemistry.

[8]  Stewart W. Wilson,et al.  Advances in learning classifier systems : 4th International Workshop, IWLCS 2001, San Francisco, CA, USA, July 7-8, 2001 : revised papers , 2002 .

[9]  Xavier Llorà,et al.  Fast rule matching for learning classifier systems via vector instructions , 2006, GECCO '06.

[10]  Christopher Stone,et al.  For Real! XCS with Continuous-Valued Inputs , 2003, Evolutionary Computation.

[11]  Stewart W. Wilson Get Real! XCS with Continuous-Valued Inputs , 1999, Learning Classifier Systems.

[12]  Xavier Llorà,et al.  Bounding the Effect of Noise in Multiobjective Learning Classifier Systems , 2003, Evolutionary Computation.

[13]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[14]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[15]  Xavier Llorà,et al.  XCS and GALE: A Comparative Study of Two Learning Classifier Systems on Data Mining , 2001, IWLCS.

[16]  Jaume Bacardit,et al.  BioHEL: Bioinformatics-oriented Hierarchical Evolutionary Learning , 2006 .

[17]  Helder Coelho,et al.  The Design of Innovation: Lessons from and for Competent Genetic Algorithms by David E. Goldberg , 2005, J. Artif. Soc. Soc. Simul..

[18]  Martin V. Butz,et al.  Data Mining in Learning Classifier Systems: Comparing XCS with GAssist , 2005, IWLCS.

[19]  David E. Goldberg,et al.  The Design of Innovation: Lessons from and for Competent Genetic Algorithms , 2002 .

[20]  David E. Goldberg,et al.  The - ary extended compact classifier system: Linkage learning in Pittsburgh LCS , 2007 .

[21]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[22]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[23]  Zhenyu Yang,et al.  Genetic and Evolutionary Computation Conference (GECCO-2008) , 2008, GECCO 2008.

[24]  H. Ishibuchi Genetic fuzzy systems: evolutionary tuning and learning of fuzzy knowledge bases , 2004 .

[25]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[26]  Tim Kovacs,et al.  Advances in Learning Classifier Systems , 2001, Lecture Notes in Computer Science.

[27]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[28]  F. Saad,et al.  Gleason score on biopsy: is it reliable for predicting the final grade on pathology? , 2002, BJU international.

[29]  Xavier Llorà,et al.  The compact classifier system: motivation, analysis, and first results , 2005, GECCO '05.

[30]  Albert Orriols-Puig A Further Look at UCS Classifier System , 2006 .

[31]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[32]  S. Hewitt,et al.  Infrared spectroscopic imaging for histopathologic recognition , 2005, Nature Biotechnology.

[33]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.