An Epicurean learning approach to gene-expression data classification

We investigate the use of perceptrons for classification of microarray data where we use two datasets that were published in [Nat. Med. 7 (6) (2001) 673] and [Science 286 (1999) 531]. The classification problem studied by Khan et al. is related to the diagnosis of small round blue cell tumours (SRBCT) of childhood which are difficult to classify both clinically and via routine histology. Golub et al. study acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL). We used a simulated annealing-based method in learning a system of perceptrons, each obtained by resampling of the training set. Our results are comparable to those of Khan et al. and Golub et al., indicating that there is a role for perceptrons in the classification of tumours based on gene-expression data. We also show that it is critical to perform feature selection in this type of models, i.e. we propose a method for identifying genes that might be significant for the particular tumour types. For SRBCTs, zero error on test data has been obtained for only 13 out of 2308 genes; for the ALL/AML problem, we have zero error for 9 out of 7129 genes that are used for the classification procedure. Furthermore, we provide evidence that Epicurean-style learning and simulated annealing-based search are both essential for obtaining the best classification results.

[1]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[2]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[3]  J. K. Lenstra,et al.  Local Search in Combinatorial Optimisation. , 1997 .

[4]  Chak-Kuen Wong,et al.  Combining the Perceptron Algorithm with Logarithmic Simulated Annealing , 2001, Neural Processing Letters.

[5]  Yves Crama,et al.  Local Search in Combinatorial Optimization , 2018, Artificial Neural Networks.

[6]  Bruce E. Hajek,et al.  Cooling Schedules for Optimal Annealing , 1988, Math. Oper. Res..

[7]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[8]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[9]  N. Maughan,et al.  An introduction to arrays , 2001, The Journal of pathology.

[10]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[11]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[12]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[15]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[16]  M. Taupitz,et al.  A Modified Perception Algorithm for Computer-Assisted Diagnosis , 2001 .

[17]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[18]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[19]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Geoffrey Holmes,et al.  Experiences with a weighted decision tree learner , 2001 .

[21]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.