Dynamic Feature Engineering and model selection methods for temporal tabular datasets with regime changes

The application of deep learning algorithms to temporal panel datasets is difficult due to heavy non-stationarities which can lead to over-fitted models that under-perform under regime changes. In this work we propose a new machine learning pipeline for ranking predictions on temporal panel datasets which is robust under regime changes of data. Different machine-learning models, including Gradient Boosting Decision Trees (GBDTs) and Neural Networks with and without simple feature engineering are evaluated in the pipeline with different settings. We find that GBDT models with dropout display high performance, robustness and generalisability with relatively low complexity and reduced computational cost. We then show that online learning techniques can be used in post-prediction processing to enhance the results. In particular, dynamic feature neutralisation, an efficient procedure that requires no retraining of models and can be applied post-prediction to any machine learning model, improves robustness by reducing drawdown in regime changes. Furthermore, we demonstrate that the creation of model ensembles through dynamic model selection based on recent model performance leads to improved performance over baseline by improving the Sharpe and Calmar ratios of out-of-sample prediction performances. We also evaluate the robustness of our pipeline across different data splits and random seeds with good reproducibility of results.

[1]  A. Narayanan,et al.  Leakage and the Reproducibility Crisis in ML-based Science , 2022, ArXiv.

[2]  J. Hullman,et al.  The Worst of Both Worlds: A Comparative Analysis of Errors in Learning from Data in Psychology and Machine Learning , 2022, AIES.

[3]  Gjergji Kasneci,et al.  Deep Neural Networks and Tabular Data: A Survey , 2021, IEEE transactions on neural networks and learning systems.

[4]  Amitai Armon,et al.  Tabular Data: Deep Learning is Not All You Need , 2021, Inf. Fusion.

[5]  Jeipratha. P. N,et al.  Effective Implementation of Neural Network Model with Tune Parameter for Stock Market Predictions , 2021, 2021 2nd International Conference on Smart Electronics and Communication (ICOSEC).

[6]  Foutse Khomh,et al.  The Challenge of Reproducible ML: An Empirical Study on The Impact of Bugs , 2021, 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS).

[7]  Josif Grabocka,et al.  Well-tuned Simple Nets Excel on Tabular Datasets , 2021, NeurIPS.

[8]  Noel Pérez,et al.  Stock Price Analysis with Deep-Learning Models , 2021, 2021 IEEE Colombian Conference on Applications of Computational Intelligence (ColCACI).

[9]  Birgitta König-Ries,et al.  Machine Learning Pipelines: Provenance, Reproducibility and FAIR Data Principles , 2020, IPAW.

[10]  Nicolas Loeff,et al.  Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting , 2019, International Journal of Forecasting.

[11]  Sercan O. Arik,et al.  TabNet: Attentive Interpretable Tabular Learning , 2019, AAAI.

[12]  A. Walden,et al.  Spectral Analysis for Univariate Time Series , 2020 .

[13]  Sergei Popov,et al.  Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data , 2019, ICLR.

[14]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[15]  Ming Zhang,et al.  AutoInt , 2019, Proceedings of the 28th ACM International Conference on Information and Knowledge Management.

[16]  Didier Sornette,et al.  Cascading logistic regression onto gradient boosted decision trees for forecasting and trading stock indices , 2019, Appl. Soft Comput..

[17]  Takuya Akiba,et al.  Optuna: A Next-generation Hyperparameter Optimization Framework , 2019, KDD.

[18]  Tie-Yan Liu,et al.  DeepGBM: A Deep Learning Framework Distilled by GBDT for Online Prediction Tasks , 2019, KDD.

[19]  El-Sayed M. El-Alfy,et al.  Evaluation of bidirectional LSTM for short-and long-term stock market prediction , 2018, 2018 9th International Conference on Information and Communication Systems (ICICS).

[20]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[21]  Michael I. Jordan,et al.  Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.

[22]  Anna Veronika Dorogush,et al.  CatBoost: unbiased boosting with categorical features , 2017, NeurIPS.

[23]  Seema Sharma,et al.  Forecasting Stock Price Using Partial Least Squares Regression , 2018, 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence).

[24]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[25]  K. P. Soman,et al.  Stock price prediction using LSTM, RNN and CNN-sliding window model , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[26]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[27]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[28]  Takashi Matsubara,et al.  Deep learning for stock prediction using numerical and textual information , 2016, 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS).

[29]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[30]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[31]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[32]  Ran Gilad-Bachrach,et al.  DART: Dropouts meet Multiple Additive Regression Trees , 2015, AISTATS.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[35]  Sotiris B. Kotsiantis,et al.  Decision trees: a recent overview , 2011, Artificial Intelligence Review.

[36]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[37]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[38]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[39]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[40]  Donald E. Kirk,et al.  Optimal control theory : an introduction , 1970 .