An Adaptive Iterative PCA-SVM Based Technique for Dimensionality Reduction to Support Fast Mining of Leukemia Data

Primary Goal of a Data mining technique is to detect and classify the data from a large data set without compromising the speed of the process. Data mining is the process of extracting patterns from a large dataset. Therefore the pattern discovery and mining are often time consuming. In any data pattern, a data is represented by several columns called the linear low dimensions. But the data identity does not equally depend upon each of these dimensions. Therefore scanning and processing the entire dataset for every query not only reduces the efficiency of the algorithm but at the same time minimizes the speed of processing. This can be solved significantly by identifying the intrinsic dimensionality of the data and applying the classification on the dataset corresponding to the intrinsic dataset only. Several algorithms have been proposed for identifying the intrinsic data dimensions and reducing the same. Once the dimension of the data is reduced, it affects the classification rate and classification rate may drop due to reduction in number of data points for decision. In this work we propose a unique technique for classifying the leukemia data by identifying and reducing the dimension of the training or knowledge dataset using Iterative process of Intrinsic dimensionality discovery and reduction using Principal Components Analysis (PCA) technique. Further the optimized data set is used to classify the given data using Support Vector Machines (SVM) classification. Results show that the proposed technique performs much better in terms of obtaining optimized data set and classification accuracy.

[1]  Shuzhong Lin,et al.  Application of Dimensionality Reduction Analysis to Fingerprint Recognition , 2008, 2008 International Symposium on Computational Intelligence and Design.

[2]  Michel Verleysen,et al.  DD-HDS: A Method for Visualization and Exploration of High-Dimensional Data , 2007, IEEE Transactions on Neural Networks.