Data complexity meta-features for regression problems

In meta-learning, classification problems can be described by a variety of features, including complexity measures. These measures allow capturing the complexity of the frontier that separates the classes. For regression problems, on the other hand, there is a lack of such type of measures. This paper presents and analyses measures devoted to estimate the complexity of the function that should fitted to the data in regression problems. As case studies, they are employed as meta-features in three meta-learning setups: (i) the first one predicts the regression function type of some synthetic datasets; (ii) the second one is designed to tune the parameter values of support vector regressors; and (iii) the third one aims to predict the performance of various regressors for a given dataset. The results show the suitability of the new measures to describe the regression datasets and their utility in the meta-learning tasks considered. In cases (ii) and (iii) the achieved results are also similar or better than those obtained by the use of classical meta-features in meta-learning.

[1]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[2]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Combining meta-learning and search techniques to select parameters for support vector machines , 2012, Neurocomputing.

[3]  Verónica Bolón-Canedo,et al.  Can classification performance be predicted by complexity measures? A study using microarray data , 2017, Knowledge and Information Systems.

[4]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Christophe Mues,et al.  Selecting Accurate and Comprehensible Regression Algorithms through Meta Learning , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[8]  Ricardo Vilalta,et al.  Using Meta-Learning to Support Data Mining , 2004, Int. J. Comput. Sci. Appl..

[9]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  A hybrid meta-learning architecture for multi-objective optimization of SVM parameters , 2014, Neurocomputing.

[10]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Noise detection in the meta-learning level , 2016, Neurocomputing.

[11]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[12]  Rong Yang,et al.  Machine Learning and Data Mining in Pattern Recognition , 2012, Lecture Notes in Computer Science.

[13]  Antonio González Muñoz,et al.  A Set of Complexity Measures Designed for Applying Meta-Learning to Instance Selection , 2015, IEEE Transactions on Knowledge and Data Engineering.

[14]  D. Basak,et al.  Support Vector Regression , 2008 .

[15]  J. Armstrong Illusions in regression analysis , 2012 .

[16]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Effect of label noise in the complexity of classification problems , 2015, Neurocomputing.

[17]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[18]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[19]  Ana Carolina Lorena,et al.  Measuring the complexity of regression problems , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[20]  Tony R. Martinez,et al.  An Easy to Use Repository for Comparing and Improving Machine Learning Algorithm Usage , 2014, MetaSel@ECAI.

[21]  George D. C. Cavalcanti,et al.  Data Complexity Measures and Nearest Neighbor Classifiers: A Practical Analysis for Meta-learning , 2012, 2012 IEEE 24th International Conference on Tools with Artificial Intelligence.

[22]  Lars Schmidt-Thieme,et al.  Two-Stage Transfer Surrogate Model for Automatic Hyperparameter Optimization , 2016, ECML/PKDD.

[23]  Joaquin Vanschoren,et al.  Selecting Classification Algorithms with Active Testing , 2012, MLDM.

[24]  Carlos Soares,et al.  Selecting parameters of SVM using meta-learning and kernel matrix-based meta-features , 2006, SAC '06.

[25]  Okan K. Ersoy,et al.  A Study of Meta Learning for Regression , 2009 .

[26]  Alex Alves Freitas,et al.  Contrasting meta-learning and hyper-heuristic research: the role of evolutionary algorithms , 2013, Genetic Programming and Evolvable Machines.

[27]  Carlos Soares,et al.  A Meta-Learning Method to Select the Kernel Width in Support Vector Regression , 2004, Machine Learning.

[28]  Carlos Soares,et al.  Exploiting Sampling and Meta-learning for Parameter Setting forSupport Vector Machines , 2002 .