ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS

Classification, a data mining task is an effective method to classify the data in the process of Knowledge Data Discovery. A Classification method, Decision tree algorithms are widely used in medical field to classify the medical data for diagnosis. Feature Selection increases the accuracy of the Classifier because it eliminates irrelevant attributes. This paper analyzes the performance of Decision tree classifier-CART with and without feature selection in terms of accuracy, time to build a model and size of the tree on various Breast Cancer Datasets. The results show that a particular feature selection using CART has enhanced the classification accuracy of a particular dataset.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[3]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[4]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[5]  D. Lavanya,et al.  Performance Evaluation of Decision Tree Classifiers on Medical Datasets , 2011 .

[6]  Euripidis N. Loukis,et al.  Using decision tree algorithms as a basis for a heart sound diagnosis decision support system , 2003, 4th International IEEE EMBS Special Topic Conference on Information Technology Applications in Biomedicine, 2003..

[7]  Kemal Polat,et al.  A New Classification Method for Breast Cancer Diagnosis: Feature Selection Artificial Immune Recognition System (FS-AIRS) , 2005, ICNC.

[8]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[9]  C. Deisy,et al.  Efficient Dimensionality Reduction Approaches for Feature Selection , 2007, International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007).

[10]  L. A. Smith,et al.  Feature Subset Selection: A Correlation Based Filter Approach , 1997, ICONIP.

[11]  A. Vlahou,et al.  Diagnosis of Ovarian Cancer Using Decision Tree Classification of Mass Spectral Data , 2003, Journal of biomedicine & biotechnology.

[12]  Evangelos Simoudis,et al.  Mining business databases , 1996, CACM.

[13]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[14]  Asha Gowda Karegowda,et al.  Feature Subset Selection Problem using Wrapper Approach in Supervised Learning , 2010 .

[15]  R. Chang,et al.  Data mining with decision trees for diagnosis of breast tumor in medical ultrasonic images , 2001, Breast Cancer Research and Treatment.

[16]  Matthew N. Anyanwu,et al.  Comparative Analysis of Serial Decision Tree Classification Algorithms , 2009 .