Hyperparameter Optimization for Unsupervised Outlier Detection

Given an unsupervised outlier detection (OD) algorithm, how can we optimize its hyperparameter(s) (HP) on a new dataset, without any labels ? In this work, we address this challenging hyperparameter optimization for unsupervised OD problem, and propose the first systematic approach called HPOD that is based on meta-learning. HPOD capitalizes on the prior performance of a large collection of HPs on existing OD benchmark datasets, and transfers this information to enable HP evaluation on a new dataset without labels. Also, HPOD adapts a prominent sampling paradigm to identify promising HPs efficiently. Extensive experiments show that HPOD works for both deep (e.g., Robust AutoEncoder) and shallow (e.g., Local Outlier Factor (LOF) and Isolation Forest (iForest)) OD algorithms on discrete and continuous HP spaces, and outperforms a wide range of baselines with on average 58% and 66% performance improvement over the default HPs of LOF and iForest.

[1]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[2]  Yue Zhao,et al.  ADBench: Anomaly Detection Benchmark , 2022, NeurIPS.

[3]  L. Akoglu,et al.  Hyperparameter Sensitivity in Deep Outlier Detection: Analysis and a Scalable Hyper-Ensemble Solution , 2022, NeurIPS.

[4]  Mauro Sozio,et al.  AutoML: state of the art with a focus on anomaly detection, challenges, and research directions , 2022, International Journal of Data Science and Analytics.

[5]  Leman Akoglu,et al.  A Large-scale Study on Unsupervised Outlier Model Selection: Do Internal Strategies Suffice? , 2021, ArXiv.

[6]  Haifeng Jin,et al.  AutoOD: Neural Architecture Search for Outlier Detection , 2021, 2021 IEEE 37th International Conference on Data Engineering (ICDE).

[7]  Shubhra Kanti Karmaker Santu,et al.  AutoML to Date and Beyond: Challenges and Opportunities , 2020, ACM Comput. Surv..

[8]  Jasper Snoek,et al.  Hyperparameter Ensembles for Robustness and Uncertainty Quantification , 2020, NeurIPS.

[9]  Xia Hu,et al.  PyODDS: An End-to-end Outlier Detection System with Automated Machine Learning , 2020, WWW.

[10]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[11]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[12]  Randy C. Paffenroth,et al.  Anomaly Detection with Robust Deep Autoencoders , 2017, KDD.

[13]  Lars Schmidt-Thieme,et al.  Two-Stage Transfer Surrogate Model for Automatic Hyperparameter Optimization , 2016, ECML/PKDD.

[14]  Nicolas Goix,et al.  How to Evaluate the Quality of Unsupervised Anomaly Detection Algorithms? , 2016, ArXiv.

[15]  Michael E. Houle,et al.  On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study , 2016, Data Mining and Knowledge Discovery.

[16]  Chris Eliasmith,et al.  Hyperopt: a Python library for model selection and hyperparameter optimization , 2015 .

[17]  Arthur Zimek,et al.  On the internal evaluation of unsupervised outlier detection , 2015, SSDBM.

[18]  Thomas G. Dietterich,et al.  A Meta-Analysis of the Anomaly Detection Problem , 2015 .

[19]  Frank Hutter,et al.  Using Meta-Learning to Initialize Bayesian Optimization of Hyperparameters , 2014, MetaSel@ECAI.

[20]  Gideon S. Mann,et al.  Efficient Transfer Learning Method for Automatic Hyperparameter Tuning , 2014, AISTATS.

[21]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[22]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[23]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[24]  Predrag Janicic,et al.  Simple algorithm portfolio for SAT , 2011, Artificial Intelligence Review.

[25]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[26]  Yuri Malitsky,et al.  ISAC - Instance-Specific Algorithm Configuration , 2010, ECAI.

[27]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[28]  L. Breiman Random Forests , 2001, Encyclopedia of Machine Learning and Data Mining.

[29]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[30]  Grace S. Shieh A weighted Kendall's tau statistic , 1998 .

[31]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[32]  Yue Zhao Automatic Unsupervised Outlier Model Selection , 2021, NeurIPS.

[33]  Yue Zhao,et al.  Revisiting Time Series Outlier Detection: Definitions and Benchmarks , 2021, NeurIPS Datasets and Benchmarks.

[34]  J. Vanschoren Meta-Learning , 2019, Automated Machine Learning.

[35]  Lars Schmidt-Thieme,et al.  Scalable Gaussian process-based transfer surrogates for hyperparameter optimization , 2017, Machine Learning.

[36]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.