Impact of Non-Proportional Training Sampling of Imbalanced Classes on Land Cover Classification Accuracy with See5 Decision Tree

The accuracy of a supervised classification is highly dependent upon the training samples. This paper is concerned with the impact of non-proportional training data sampling of imbalanced classes on the land cover classification accuracy, using the See5 decision tree classifier. The purpose of this paper is 1) to examine experimentally how the training sampling ratio affects classification accuracy in the imbalanced class scenario; and 2) to determine the best training data sampling ratio for optimal classification performance using a See5 decision tree classifier. To better measure classification accuracy, we propose a balanced accuracy measure of a targeted class, which incorporates both False Positive and False Negative errors to truthfully reflect the accuracy of a targeted class. The study result indicates that balancing the training sample between classes does not necessarily improve the classification accuracy. Instead, selecting a training sample ratio which equals the actual ratio of the coverages of the imbalanced classes will yield the best classification performance.