Learning Abstract Task Representations

A proper form of data characterization can guide the process of learning-algorithm selection and model-performance estimation. The field of meta-learning has provided a rich body of work describing effective forms of data characterization using different families of meta-features (statistical, modelbased, information-theoretic, topological, etc.). In this paper, we start with the abundant set of existing meta-features and propose a method to induce new abstract meta-features as latent variables in a deep neural network. We discuss the pitfalls of using traditional meta-features directly and argue for the importance of learning high-level task properties. We demonstrate our methodology using a deep neural network as a feature extractor. We demonstrate that 1) induced metamodels mapping abstract meta-features to generalization performance outperform other methods by ∼ 18% on average, and 2) abstract meta-features attain high feature-relevance scores.

[1]  Tin Kam Ho,et al.  How Complex is your classification problem? A survey on measuring classification complexity , 2018 .

[2]  Joaquin Vanschoren,et al.  Meta-Learning: A Survey , 2018, Automated Machine Learning.

[3]  Amos Storkey,et al.  Meta-Learning in Neural Networks: A Survey , 2020, IEEE transactions on pattern analysis and machine intelligence.

[4]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[5]  Edesio Alcobaça,et al.  MFE: Towards reproducible meta-feature extraction , 2020, J. Mach. Learn. Res..

[6]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[8]  Kate Smith-Miles,et al.  Instance spaces for machine learning classification , 2017, Machine Learning.

[9]  Larry A. Rendell,et al.  Learning Despite Concept Variation by Finding Structure in Attribute-based Data , 1996, ICML.

[10]  George W. Fitzmaurice,et al.  Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing , 2017, CHI.

[11]  Malik Magdon-Ismail,et al.  Examining the Use of Neural Networks for Feature Extraction: A Comparative Analysis using Deep Learning, Support Vector Machines, and K-Nearest Neighbor Classifiers , 2018, ArXiv.

[12]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[13]  Alberto Abelló,et al.  On the predictive power of meta-features in OpenML , 2017, Int. J. Appl. Math. Comput. Sci..

[14]  Marco Zaffalon,et al.  Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis , 2016, J. Mach. Learn. Res..

[15]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Towards Reproducible Empirical Research in Meta-Learning , 2018, ArXiv.

[16]  Ricardo Vilalta Understanding Accuracy Performance Through Concept Characterization and Algorithm Analysis , 1999 .

[17]  Róbert Busa-Fekete,et al.  DNN-Based Feature Extraction and Classifier Combination for Child-Directed Speech, Cold and Snoring Identification , 2017, INTERSPEECH.

[18]  Lars Schmidt-Thieme,et al.  Dataset2Vec: learning dataset meta-features , 2019, Data Mining and Knowledge Discovery.

[19]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[20]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[21]  Subhransu Maji,et al.  Task2Vec: Task Embedding for Meta-Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Boosting meta-learning with simulated data complexity measures , 2020, Intell. Data Anal..

[23]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.