Analyzing gene expression data: Fuzzy decision tree algorithm applied to the classification of cancer data

In data mining, decision tree algorithms are very popular methodologies since the algorithms have a simple inference mechanism and provide a comprehensible way to represent the model in the form of a decision tree. Over the past years, fuzzy decision tree algorithms have been proposed in order to provide a way to handle uncertainty in the data collected. Fuzzy decision tree algorithms have shown to outperform classical decision tree algorithms. This paper investigates a fuzzy decision tree algorithm applied to the classification of gene expression data. The fuzzy decision tree algorithm is compared to a classical decision tree algorithm as well as other well-known data mining algorithms commonly applied to classification tasks. Based on the five data sets analyzed, the fuzzy decision tree algorithm outperforms the classical decision tree algorithm. However, compared to other commonly used classification algorithms, both decision tree algorithms are competitive, although both do not reach the accuracy values of the best performing classifier.

[1]  L. Zadeh Probability measures of Fuzzy events , 1968 .

[2]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Pei-Chann Chang,et al.  Evolving and clustering fuzzy decision tree for financial time series data forecasting , 2009, Expert Syst. Appl..

[4]  Tom C. Freeman,et al.  Improved grading and survival prediction of human astrocytic brain tumors by artificial neural network analysis of gene expression microarray data , 2008, Molecular Cancer Therapeutics.

[5]  M. Čuperlović-Culf,et al.  Determination of tumour marker genes from gene expression data. , 2005, Drug discovery today.

[6]  Cezary Z. Janikow,et al.  Fuzzy decision trees: issues and methods , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[7]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[8]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[9]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[10]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[11]  John A. Swets,et al.  Signal Detection Theory and ROC Analysis in Psychology and Diagnostics: Collected Papers , 1996 .

[12]  Enrique H. Ruspini,et al.  A New Approach to Clustering , 1969, Inf. Control..

[13]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[14]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[15]  Yue Han,et al.  Stable Gene Selection from Microarray Data via Sample Weighting , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Sankar K. Pal,et al.  Fuzzy decision tree, linguistic rules and fuzzy knowledge-based network: generation and evaluation , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[17]  Steven Skiena,et al.  The Algorithm Design Manual , 2020, Texts in Computer Science.

[18]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[19]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[20]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Pei-Chann Chang,et al.  A CBR-based fuzzy decision tree approach for database classification , 2010, Expert Syst. Appl..

[22]  Qinghua Hu,et al.  Fuzzy Rough Decision Trees , 2012, RSCTC.

[23]  Li M Fu,et al.  Multi‐class cancer subtype classification based on gene expression signatures with reliability analysis , 2004, FEBS letters.

[24]  J. R. Quinlan Discovering rules by induction from large collections of examples Intro-ductory readings in expert s , 1979 .

[25]  Ian Witten,et al.  Data Mining , 2000 .

[26]  Pradipta Kishore Dash,et al.  Measurement and Classification of Simultaneous Power Signal Patterns With an S-Transform Variant and Fuzzy Decision Tree , 2013, IEEE Transactions on Industrial Informatics.

[27]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian Cancer , 2002 .

[28]  Theodosios Pavlidis,et al.  Fuzzy Decision Tree Algorithms , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[29]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[30]  Jiawei Han,et al.  Cancer classification using gene expression data , 2003, Inf. Syst..

[31]  OlaruCristina,et al.  A complete fuzzy decision tree technique , 2003 .

[32]  Louis Wehenkel,et al.  A complete fuzzy decision tree technique , 2003, Fuzzy Sets Syst..