Meta Hyperparameter Optimization with Adversarial Proxy Subsets Sampling

Hyperparameter optimization (HPO), aiming at automatically searching optimal hyperparameter configurations, has attracted increasing attention in the machine learning community. HPO generally suffers from high searching costs when dealing with large-scale real-world datasets since training the model with a certain hyperparameter configuration is time-consuming. Existing works suggest sampling subsets uniformly to represent the full dataset for HPO but ignoring the complex and dynamic distribution in real-world scenarios and the exploration of hyperparameter transfer. To tackle this problem, we propose a novel meta hyperparameter optimization model with an adversarial proxy subsets sampling strategy (Meta-HPO), which can transfer hyperparameters optimized on the sampled proxy subsets to the full dataset and further adapt to the new data in an out-of-sample updating manner. In particular, a perturbation-aware adversarial sampling strategy is designed to select the proxy subsets that significantly influence the model performance. With the searched hyperparameter configurations and corresponding performance scores on the proxy subsets, we propose a meta transfer framework, named "hp-learner'', to build the connection between the distribution of dataset and the optimal hyperparameter configuration. Our Meta-HPO provides a flexible and efficient hyperparameter optimization algorithm. Extensive experiments on real-world datasets validate the advantages of our proposed Meta-HPO model against existing state-of-the-art benchmarks.

[1]  Paul Covington,et al.  Deep Neural Networks for YouTube Recommendations , 2016, RecSys.

[2]  Kaiyong Zhao,et al.  AutoML: A Survey of the State-of-the-Art , 2019, Knowl. Based Syst..

[3]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[4]  Gideon S. Mann,et al.  Efficient Transfer Learning Method for Automatic Hyperparameter Tuning , 2014, AISTATS.

[5]  Aaron Klein,et al.  Practical Hyperparameter Optimization for Deep Learning , 2018, International Conference on Learning Representations.

[6]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[7]  Yannis Avrithis,et al.  Walking on the Edge: Fast, Low-Distortion Adversarial Examples , 2019, IEEE Transactions on Information Forensics and Security.

[8]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[9]  Sherif Sakr,et al.  SmartML: A Meta Learning-Based Framework for Automated Selection and Hyperparameter Tuning for Machine Learning Algorithms , 2019, EDBT.

[10]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[11]  Matthias Feurer Scalable Meta-Learning for Bayesian Optimization using Ranking-Weighted Gaussian Process Ensembles , 2018 .

[12]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[13]  Lars Schmidt-Thieme,et al.  Scalable Gaussian process-based transfer surrogates for hyperparameter optimization , 2017, Machine Learning.

[14]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[15]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[16]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[19]  Lars Schmidt-Thieme,et al.  Two-Stage Transfer Surrogate Model for Automatic Hyperparameter Optimization , 2016, ECML/PKDD.

[20]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[21]  Alexander J. Smola,et al.  Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[22]  Vladlen Koltun,et al.  Multi-Task Learning as Multi-Objective Optimization , 2018, NeurIPS.

[23]  Quanming Yao,et al.  Pooling Architecture Search for Graph Classification , 2021, CIKM.

[24]  AutoNE , 2019, Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.

[25]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[26]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[27]  Wenwu Zhu,et al.  Explainable Automated Graph Representation Learning with Hyperparameter Importance , 2021, ICML.

[28]  Jian Pei,et al.  AutoNE: Hyperparameter Optimization for Massive Network Embedding , 2019, KDD.

[29]  Matthias W. Seeger,et al.  Scalable Hyperparameter Transfer Learning , 2018, NeurIPS.

[30]  Wenwu Zhu,et al.  AutoAttend: Automated Attention Representation Search , 2021, ICML.

[31]  Ying Wei,et al.  Transferable Neural Processes for Hyperparameter Optimization , 2019, ArXiv.

[32]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[33]  Wenwu Zhu,et al.  Automated Machine Learning on Graph , 2021, KDD.

[34]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.