AutoML with Bayesian Optimizations for Big Data Management

The field of automated machine learning (AutoML) has gained significant attention in recent years due to its ability to automate the process of building and optimizing machine learning models. However, the increasing amount of big data being generated has presented new challenges for AutoML systems in terms of big data management. In this paper, we introduce Fabolas and learning curve extrapolation as two methods for accelerating hyperparameter optimization. Four methods for quickening training were presented including Bag of Little Bootstraps, k-means clustering for Support Vector Machines, subsample size selection for gradient descent, and subsampling for logistic regression. Additionally, we also discuss the use of Markov Chain Monte Carlo (MCMC) methods and other stochastic optimization techniques to improve the efficiency of AutoML systems in managing big data. These methods enhance various facets of the training process, making it feasible to combine them in diverse ways to gain further speedups. We review several combinations that have potential and provide a comprehensive understanding of the current state of AutoML and its potential for managing big data in various industries. Furthermore, we also mention the importance of parallel computing and distributed systems to improve the scalability of the AutoML systems while working with big data.

[1]  Kwang-Woo Jeon,et al.  Neural Architecture Search Survey: A Computer Vision Perspective , 2023, Sensors.

[2]  A. Boulesteix,et al.  Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges , 2021, WIREs Data. Mining. Knowl. Discov..

[3]  Dimitri A. Michalopoulos,et al.  Neuro-Fuzzy Employee Ranking System in the Public Sector , 2023, FSDM.

[4]  N. Schizas,et al.  TinyML for Ultra-Low Power AI and Large Scale IoT Deployments: A Systematic Review , 2022, Future Internet.

[5]  M. Sipper High Per Parameter: A Large-Scale Study of Hyperparameter Tuning for Machine Learning Algorithms , 2022, Algorithms.

[6]  A. L. Imoize,et al.  Machine Learning-Based Boosted Regression Ensemble Combined with Hyperparameter Tuning for Optimal Adaptive Learning , 2022, Sensors.

[7]  Mauro Sozio,et al.  AutoML: state of the art with a focus on anomaly detection, challenges, and research directions , 2022, International Journal of Data Science and Analytics.

[8]  Ce Zhang,et al.  Hyper-tune , 2022, Proceedings of the VLDB Endowment.

[9]  P. Mishra,et al.  A tutorial on automatic hyperparameter tuning of deep spectral modelling for regression and classification tasks , 2022, Chemometrics and Intelligent Laboratory Systems.

[10]  Mikhail Burtsev,et al.  A review of neural architecture search , 2021, Neurocomputing.

[11]  Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIX , 2022, ECCV.

[12]  Artificial Intelligence Applications and Innovations - 18th IFIP WG 12.5 International Conference, AIAI 2022, Hersonissos, Crete, Greece, June 17-20, 2022, Proceedings, Part I , 2022, AIAI.

[13]  Artificial Intelligence Applications and Innovations. AIAI 2022 IFIP WG 12.5 International Workshops: MHDW 2022, 5G-PINE 2022, AIBMG 2022, ML@HC 2022, and AIBEI 2022, Hersonissos, Crete, Greece, June 17–20, 2022, Proceedings , 2022, AIAI Workshops.

[14]  F. Lionetto,et al.  Framework for personalized prediction of treatment response in relapsing remitting multiple sclerosis , 2020, BMC Medical Research Methodology.

[15]  Marius Lindauer,et al.  Best Practices for Scientific Research on Neural Architecture Search , 2019, ArXiv.

[16]  Verónica Bolón-Canedo,et al.  A review of feature selection methods in medical applications , 2019, Comput. Biol. Medicine.

[17]  Qingquan Song,et al.  Auto-Keras: An Efficient Neural Architecture Search System , 2018, KDD.

[18]  HaiYing Wang,et al.  More Efficient Estimation for Logistic Regression with Optimal Subsamples , 2018, J. Mach. Learn. Res..

[19]  Grigorios Tsoumakas A survey of machine learning techniques for food sales prediction , 2018, Artificial Intelligence Review.

[20]  Marius Lindauer,et al.  Neural Networks for Predicting Algorithm Runtime Distributions , 2017, IJCAI.

[21]  Rong Zhu,et al.  Optimal Subsampling for Large Sample Logistic Regression , 2017, Journal of the American Statistical Association.

[22]  Deepak S. Turaga,et al.  Learning Feature Engineering for Classification , 2017, IJCAI.

[23]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[24]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[25]  Gian Antonio Susto,et al.  Machine Learning for Predictive Maintenance: A Multiple Classifier Approach , 2015, IEEE Transactions on Industrial Informatics.

[26]  Frank Hutter,et al.  Initializing Bayesian Hyperparameter Optimization via Meta-Learning , 2015, AAAI.

[27]  Ming Li,et al.  Dual-source discrimination power analysis for multi-instance contactless palmprint recognition , 2015, Multimedia Tools and Applications.

[28]  Buyue Qian,et al.  Improving rail network velocity: A machine learning approach to predictive maintenance , 2014 .

[29]  Sungwan Bang,et al.  Weighted Support Vector Machine Using k-Means Clustering , 2014, Commun. Stat. Simul. Comput..

[30]  Trevor Hastie,et al.  LOCAL CASE-CONTROL SAMPLING: EFFICIENT SUBSAMPLING IN IMBALANCED DATA SETS. , 2013, Annals of statistics.

[31]  Lu Leng,et al.  PalmHash Code vs. PalmPhasor Code , 2013, Neurocomputing.

[32]  Jorge Nocedal,et al.  Sample size selection in optimization methods for machine learning , 2012, Math. Program..

[33]  F. Götze,et al.  RESAMPLING FEWER THAN n OBSERVATIONS: GAINS, LOSSES, AND REMEDIES FOR LOSSES , 2012 .

[34]  Purnamrita Sarkar,et al.  A scalable bootstrap for massive data , 2011, 1112.5016.

[35]  Charles R. Farrar,et al.  Machine learning algorithms for damage detection under operational and environmental variability , 2011 .

[36]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[37]  A. Imon,et al.  Weighted Bootstrap with Probability in Regression , 2009 .

[38]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[39]  Guido Dedene,et al.  Auto claim fraud detection using Bayesian learning neural networks , 2005, Expert Syst. Appl..

[40]  Vladimir Vapnik,et al.  Support-vector networks , 2004, Machine Learning.

[41]  Julia S. J. Yeo,et al.  How Neural Networks Can Help Loan Officers to Make Better Informed Application Decisions , 2003 .

[42]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.