Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm

Selecting the best configuration of hyperparameter values for a Machine Learning model yields directly in the performance of the model on the dataset. It is a laborious task that usually requires deep knowledge of the hyperparameter optimizations methods and the Machine Learning algorithms. Although there exist several automatic optimization techniques, these usually take significant resources, increasing the dynamic complexity in order to obtain a great accuracy. Since one of the most critical aspects in this computational consume is the available dataset, among others, in this paper we perform a study of the effect of using different partitions of a dataset in the hyperparameter optimization phase over the efficiency of a Machine Learning algorithm. Nonparametric inference has been used to measure the rate of different behaviors of the accuracy, time, and spatial complexity that are obtained among the partitions and the whole dataset. Also, a level of gain is assigned to each partition allowing us to study patterns and allocate whose samples are more profitable. Since Cybersecurity is a discipline in which the efficiency of Artificial Intelligence techniques is a key aspect in order to extract actionable knowledge, the statistical analyses have been carried out over five Cybersecurity datasets.

[1]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[2]  Taimoor Akhtar,et al.  Efficient Hyperparameter Optimization for Deep Learning Algorithms Using Deterministic RBF Surrogates , 2016, AAAI.

[3]  Bhojane Yogesh,et al.  Intelligent rule-based Phishing Websites Classification , 2016 .

[4]  Yoshua Bengio,et al.  Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.

[5]  Noemí DeCastro-García,et al.  On Detecting and Removing Superficial Redundancy in Vector Databases , 2018 .

[6]  Maurice Clerc,et al.  The particle swarm - explosion, stability, and convergence in a multidimensional complex space , 2002, IEEE Trans. Evol. Comput..

[7]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[8]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[9]  T. L. McCluskey,et al.  An assessment of features related to phishing websites using an automated technique , 2012, 2012 International Conference for Internet Technology and Secured Transactions.

[10]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[11]  Piotr Jedrzejowicz,et al.  An Approach to Data Reduction for Learning from Big Datasets: Integrating Stacking, Rotation, and Agent Population Learning Techniques , 2018, Complex..

[12]  Massimiliano Zanin,et al.  Credit card fraud detection through parenclitic network analysis , 2017, Complex..

[13]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[14]  Koen Vanhoof,et al.  Detecting malicious URLs using machine learning techniques , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[15]  Curtis Busby-Earle,et al.  The role of machine learning in botnet detection , 2016, 2016 11th International Conference for Internet Technology and Secured Transactions (ICITST).

[16]  Paolo Rosso,et al.  Exploring high-level features for detecting cyberpedophilia , 2014, Comput. Speech Lang..

[17]  David D. Cox,et al.  Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.

[18]  Peter D. Turney Types of Cost in Inductive Concept Learning , 2002, ArXiv.

[19]  Noemí DeCastro-García,et al.  Expert knowledge and data analysis for detecting advanced persistent threats , 2017 .

[20]  Marc Parizeau,et al.  DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..

[21]  Igor Santos,et al.  Supervised machine learning for the detection of troll profiles in twitter social network: application to a real case of cyberbullying , 2015, Log. J. IGPL.

[22]  Manoranjan Dash,et al.  An Evaluation of Progressive Sampling for Imbalanced Data Sets , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[23]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[24]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[25]  Vicente Matellán Olivera,et al.  Detection of Cyber-attacks to indoor real time localization systems for autonomous robots , 2018, Robotics Auton. Syst..

[26]  Tim Oates,et al.  Efficient progressive sampling , 1999, KDD '99.

[27]  B. Wujek,et al.  Automated Hyperparameter Tuning for Effective Machine Learning , 2017 .

[28]  Kevin Leyton-Brown,et al.  Efficient Benchmarking of Hyperparameter Optimizers via Surrogates , 2015, AAAI.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[31]  Katya Scheinberg,et al.  Introduction to derivative-free optimization , 2010, Math. Comput..

[32]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[33]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[34]  Francis R. Bach,et al.  Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression , 2016, J. Mach. Learn. Res..

[35]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[36]  Masaki Onishi,et al.  Effective hyperparameter optimization using Nelder-Mead method in deep learning , 2017, IPSJ Transactions on Computer Vision and Applications.

[37]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[38]  Aladdin Ayesh,et al.  Intelligent intrusion detection systems using artificial neural networks , 2018, ICT Express.

[39]  Zhixiong Lu,et al.  A Novel Efficient Feature Dimensionality Reduction Method and Its Application in Engineering , 2018, Complex..

[40]  Tim Kraska,et al.  Automating model search for large scale machine learning , 2015, SoCC.

[41]  T. L. McCluskey,et al.  Predicting phishing websites based on self-structuring neural network , 2013, Neural Computing and Applications.

[42]  Ameet Talwalkar,et al.  Non-stochastic Best Arm Identification and Hyperparameter Optimization , 2015, AISTATS.

[43]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[44]  Thomas Bartz-Beielstein,et al.  Tuned data mining: a benchmark study on different tuners , 2011, GECCO '11.

[45]  X. C. Guo,et al.  A novel LS-SVMs hyper-parameter selection based on particle swarm optimization , 2008, Neurocomputing.

[46]  Ricardo Baeza-Yates,et al.  Big Data or Right Data? , 2013, AMW.

[47]  Vijay V. Raghavan,et al.  Big Data: Promises and Problems , 2015, Computer.

[48]  Yang Yuan,et al.  Hyperparameter Optimization: A Spectral Approach , 2017, ICLR.

[49]  S. P. Shantharajah,et al.  A Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms , 2015 .

[50]  Tapani Raiko,et al.  Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters , 2015, ICML.

[51]  J. Orbach Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. , 1962 .

[52]  Feng Gao,et al.  Reduction of Large Training Set by Guided Progressive Sampling: Application to Neonatal Intensive Care Data , 2007 .

[53]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..