Feature selection of breast cancer based on Principal Component Analysis

Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques and allows computer to learn from past examples and detect patterns from large data sets, which is particularly well-suited to assist medical practitioners in diagnosis of disease based on a variety of test results. Therefore, in this research, we deemed further by developing feature extraction algorithm based on Principal Component Analysis (PCA) and Artificial Neural Network (ANNs) as classifier as the optimal tool to enhance the classification of benign or malignant based on the Wisconsin Breast Cancer Database. In addition, the three rules of thumb of PCA namely the Scree Test, Cumulative Variance and the KG rule are employed as feature selection. An ensemble of the reduced datasets based on these rules is used as the inputs to ANN classifier with back propagation algorithm. Initial results showed that this approach is able to discriminate between the normal and breast cancer patients.

[1]  B. Ahmadi,et al.  A Principal Component Analysis Based Method for Estimating Depth of Anesthesia , 2008, 2008 2nd International Conference on Bioinformatics and Biomedical Engineering.

[2]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[3]  Lixin Shen,et al.  Unsupervised Detection of Suspicious Tissue Using Data Modeling and PCA , 2006, Int. J. Biomed. Imaging.

[4]  Yong He,et al.  A Novel Approach to Pattern Recognition Based on PCA-ANN in Spectroscopy , 2006, ADMA.

[5]  S. A. Samad,et al.  Statistical analysis approach for posture recognition , 2008, 2008 2nd International Conference on Signal Processing and Communication Systems.

[6]  R. Cattell The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[7]  Alan H. Fielding,et al.  Cluster and Classification Techniques for the Biosciences , 2006 .

[8]  R. Darlington,et al.  Factor Analysis , 2008 .