Automatic Selection of Classification Algorithms for Non-Experts Using Meta-Features

With the arrival of big-data society, methods for classifying real-world problems have attracted much attention for researchers and developers in various fields. In recent years, much effort has been devoted for improving performances of classification algorithms by adding functions or modifying their weaknesses. However, since a large variety of classification algorithms has been available, it is difficult for non-experts to find classification algorithms that achieve good results on a given data set. Therefore, if there is a system which automatically selects the best classification algorithm for a given data set, non-experts would receive various benefits such as saving time and effort. This paper presents a system of predicting the best possible classification algorithm for a given data set with respect to the accuracy. To the best of our knowledge, this is the first approach focused on predicting the best one. The main target users of the proposed system are non-experts who do not have knowledge and experience in data mining. The proposed system utilizes useful meta-features selected from existing meta-features to increase the performance of the prediction. The feature selection is conducted by a wrapper approach with the genetic search algorithm. In the proposed system, K-nearest neighbor algorithm is used to learn the selected meta-features and build a classification model for predicting future data. Experiments using 58 real-world data sets show that the proposed system predicted the best classification algorithm with 60.34% accuracy from the top five in 30 classification algorithms.