Evaluating Model Selection Abilities of Performance Measures

Model selection is an important task in machine learning and data mining. When using the holdout testing method to do model selection, a consensus in the machine learning community is that the same model selection goal should be used to identify the best model based on available data. However, following the preliminary work of (Rosset 2004), we show that this is, in general, not true under highly uncertain situations where only very limited data are available. We thoroughly investigate model selection abilities of different measures under highly uncertain situations as we vary model selection goals, learning algorithms and class distributions. The experimental results show that a measure’s model selection ability is relatively stable to the model selection goals and class distributions. However, different learning algorithms call for different measures for model selection. For learning algorithms of SVM and KNN, generally the measures of RMS, SAUC, MXE perform the best. For learning algorithms of decision trees and naive Bayes, generally the measures of RMS, SAUC, MXE, AUC, APR have the best performance.