Binary categorization of DNA data with unbalanced class distribution for prediction of hepatocellular carcinoma

Experiments and generally data in the real world are unbalanced, that is the classification categories are not approximately equally presented because of subject mortality, non-response, etc. The term "Unbalanced" in this context is relative to the distribution of records among the target classes. The various limitations of working with an unbalanced data are discrepancies in calculating the effective mean and also lead to heterogeneity of variance across cells and make problems for valid standard error estimates. The idea of this paper is to investigate classification algorithms and compare the consistency using Matthew's Correlation Coefficient. With this motive, the authors aim to stress on the importance of balanced data to predict the defective and abnormal DNA that aids in detecting Liver ailments leading to Hepatocellular Carcinoma (Liver Cancer).