Dimensionality Reduction for Efficient Classification of DNA Repair Genes

DNA damage is an imperative process which plays a crucial role in ageing demanding the need for classification of DNA repair genes into ageing and non-ageing. In our paper, we employ a data mining approach for classifying DNA repair genes using their various characteristic features. The classification models built were difficult to analyze and interpret due to the curse of dimensionality present in the gene dataset. This difficulty is overcome by adopting Dimensionality Reduction which is a well-known pre-processing technique. The Feature subset selection technique along with various search methods is used to reduce the dataset without affecting the integrity of the original dataset. The reduction in the dataset enabled the use of Multilayer perceptron in the efficient analysis of the dataset. The classifiers showed better performance on the reduced dataset when compared to the original dataset.