Improving Prediction Accuracy for Logistic Regression on Imbalanced Datasets

An imbalanced dataset is a dataset that has a majority class which is a class has far more example distributions than other classes. It is difficult to deal with unbalanced datasets in classification problems, and many classification algorithms do not perform well in unbalanced datasets. In this paper, we present our logistic regression analysis with Python on imbalanced datasets and determine different thresholds for classification according to the data proportion of imbalanced datasets.