Data preparation for pre-processing on oral cancer dataset

In this paper, data pre-processing tasks involving data interpretation, data integration, noisy data, missing data, and data inconsistency are presented. The dataset prepared includes all the fields that are required for the research, pertaining to oral cancer diagnosis with demographics, social habit, clinical symptoms, and histological variables. After data normalization and transformation, the finding of the study prepared oral cancer dataset with 27 attributes as a part of study contribution. There are only one continuous and one numerical variable, which are case_id and age. The remaining variables are discrete or categorical variables.