Handling class imbalance in customer behavior prediction

Class imbalance is a common problem in real world applications and it affects significantly the prediction accuracy. In this study, investigation on better handling class imbalance problem in customer behavior prediction is performed. Using a more appropriate evaluation metric (AUC), we investigated the increase of performance for under-sampling and two machine learning algorithms (weight Random Forests and RUSBoost) against a benchmark case of just using Random Forests. Results show that under-sampling is the most effective way to deal with class imbalance. RUSBoost, as a specific algorithm designed to deal with class imbalance problem, is also effective but not as good as under-sampling. Weighted Random Forests, as a cost-sensitive learner, only improves the performance of appetency classification problem out of three classification problems.