An Approach to overcome Imbalance Datasets of Eukaryotic Genomes during the Analysis by Machine Learning Technique (SVM)

In biology, Support Vector Machines (SVM) is most frequently used tool for the analysis of gene expression, microarray experiments and other biological applications. In human genome dataset, only a small proportion of the DNA sequences represent genes, and the rest do not. In our work, we highlighted the reasons why, particular SVM, fails and what can be done to overcome this.