Simple Modifications to Improve Tabular Neural Networks

There is growing interest in neural network architectures for tabular data. Many general-purpose tabular deep learning models have been introduced recently, with performance sometimes rivaling gradient boosted decision trees (GBDTs). These recent models draw inspiration from various sources, including GBDTs, factorization machines, and neural networks from other application domains. Previous tabular neural networks are also drawn upon, but are possibly underconsidered, especially models associated with specific tabular problems. This paper focuses on several such models, and proposes modifications for improving their performance. When modified, these models are shown to be competitive with leading general-purpose tabular models, including GBDTs.

[1]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[2]  Takuya Akiba,et al.  Optuna: A Next-generation Hyperparameter Optimization Framework , 2019, KDD.

[3]  Amitai Armon,et al.  Tabular Data: Deep Learning is Not All You Need , 2021, Inf. Fusion.

[4]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.

[5]  Sercan O. Arik,et al.  TabNet: Attentive Interpretable Tabular Learning , 2019, AAAI.

[6]  Hang Zhang,et al.  AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data , 2020, ArXiv.

[7]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[8]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[9]  Sergei Popov,et al.  Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data , 2019, ICLR.

[10]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[11]  Ran El-Yaniv,et al.  DNF-Net: A Neural Architecture for Tabular Data , 2020, ArXiv.

[12]  Anna Veronika Dorogush,et al.  CatBoost: unbiased boosting with categorical features , 2017, NeurIPS.

[13]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[14]  Yuandong Tian,et al.  One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers , 2019, NeurIPS.

[15]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[16]  Gang Fu,et al.  Deep & Cross Network for Ad Click Predictions , 2017, ADKDD@KDD.

[17]  Xin Huang,et al.  TabTransformer: Tabular Data Modeling Using Contextual Embeddings , 2020, ArXiv.

[18]  Xing Xie,et al.  xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems , 2018, Knowledge Discovery and Data Mining.

[19]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[20]  Sergio Escalera,et al.  Analysis of the AutoML Challenge Series 2015-2018 , 2019, Automated Machine Learning.

[21]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[22]  Valentin Khrulkov,et al.  Revisiting Deep Learning Models for Tabular Data , 2021, NeurIPS.

[23]  Josif Grabocka,et al.  Regularization is all you Need: Simple Neural Nets can Excel on Tabular Data , 2021, ArXiv.

[24]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[25]  Junlin Zhang,et al.  FiBiNET , 2019, Proceedings of the 13th ACM Conference on Recommender Systems.

[26]  Jian Tang,et al.  AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks , 2018, CIKM.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Ognjen Arandjelovic,et al.  A New Look at Ghost Normalization , 2020, ArXiv.

[29]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[30]  Jun Wang,et al.  Product-Based Neural Networks for User Response Prediction , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[31]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[34]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[35]  Elad Hoffer,et al.  Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.