Automatic recommendation of classification algorithms based on data set characteristics

Choosing appropriate classification algorithms for a given data set is very important and useful in practice but also is full of challenges. In this paper, a method of recommending classification algorithms is proposed. Firstly the feature vectors of data sets are extracted using a novel method and the performance of classification algorithms on the data sets is evaluated. Then the feature vector of a new data set is extracted, and its k nearest data sets are identified. Afterwards, the classification algorithms of the nearest data sets are recommended to the new data set. The proposed data set feature extraction method uses structural and statistical information to characterize data sets, which is quite different from the existing methods. To evaluate the performance of the proposed classification algorithm recommendation method and the data set feature extraction method, extensive experiments with the 17 different types of classification algorithms, the three different types of data set characterization methods and all possible numbers of the nearest data sets are conducted upon the 84 publicly available UCI data sets. The results indicate that the proposed method is effective and can be used in practice.

[1]  Nikolaj Tatti,et al.  Distances between Data Sets Based on Summary Statistics , 2007, J. Mach. Learn. Res..

[2]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[3]  Carla E. Brodley,et al.  Addressing the Selective Superiority Problem: Automatic Algorithm/Model Class Selection , 1993 .

[4]  Raymond J. Mooney,et al.  Symbolic and neural learning algorithms: An experimental comparison , 1991, Machine Learning.

[5]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[6]  Victor Ciesielski,et al.  Matching Data Mining Algorithm Suitability to Data Characteristics Using a Self-Organizing Map , 2001, HIS.

[7]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[8]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[9]  Kate Smith-Miles,et al.  Cross-disciplinary perspectives on meta-learning for algorithm selection , 2009, CSUR.

[10]  Peter I. Cowling,et al.  MCAR: multi-class classification based on association rule , 2005, The 3rd ACS/IEEE International Conference onComputer Systems and Applications, 2005..

[11]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[12]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[13]  Robert P. W. Duin,et al.  A note on comparing classifiers , 1996, Pattern Recognit. Lett..

[14]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[15]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Kate Smith-Miles,et al.  On learning algorithm selection for classification , 2006, Appl. Soft Comput..

[17]  João Gama,et al.  Characterizing the Applicability of Classification Algorithms Using Meta-Level Learning , 1994, ECML.

[18]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[19]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[20]  Carlos Soares,et al.  Zoomed Ranking: Selection of Classification Algorithms Based on Relevant Performance Information , 2000, PKDD.

[21]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[22]  J. R. Quinlan Discovering rules by induction from large collections of examples Intro-ductory readings in expert s , 1979 .

[23]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[24]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[25]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[26]  KATE A. SMITH,et al.  Modelling the relationship between problem characteristics and data mining algorithm performance using neural networks , 2001 .

[27]  David W. Aha,et al.  Generalizing from Case studies: A Case Study , 1992, ML.

[28]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[29]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[30]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[31]  Qinbao Song,et al.  A Weighted Voting-Based Associative Classification Algorithm , 2010, Comput. J..

[32]  J. R. Quinlan,et al.  Comparing connectionist and symbolic learning methods , 1994, COLT 1994.

[33]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[34]  João Gama,et al.  Characterization of Classification Algorithms , 1995, EPIA.

[35]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[36]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[37]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[38]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[39]  João Gama,et al.  On Data and Algorithms: Understanding Inductive Performance , 2004, Machine Learning.

[40]  Michael Stuart,et al.  Understanding Robust and Exploratory Data Analysis , 1984 .

[41]  Sholom M. Weiss,et al.  An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods , 1989, IJCAI.

[42]  Carlos Soares,et al.  Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results , 2003, Machine Learning.

[43]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[44]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[45]  Tin Kam Ho,et al.  Domain of competence of XCS classifier system in complexity measurement space , 2005, IEEE Transactions on Evolutionary Computation.

[46]  Robert P. W. Duin,et al.  On the nonlinearity of pattern classifiers , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[47]  Donald Michie,et al.  Expert systems in the micro-electronic age , 1979 .

[48]  Ling Xu,et al.  Ordering Effects in Clustering , 1992, ML.

[49]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[50]  Frederick Mosteller,et al.  Understanding robust and exploratory data analysis , 1983 .

[51]  Huan Liu,et al.  Searching for interacting features in subset selection , 2009, Intell. Data Anal..

[52]  Lloyd A. Smith,et al.  Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper , 1999, FLAIRS.