Net-DNF: Effective Deep Modeling of Tabular Data

A challenging open question in deep learning is how to handle tabular data. Unlike domains such as image and natural language processing, where deep architectures prevail, there is still no widely accepted neural architecture that dominates tabular data. As a step toward bridging this gap, we present Net-DNF a novel generic architecture whose inductive bias elicits models whose structure corresponds to logical Boolean formulas in disjunctive normal form (DNF) over affine soft-threshold decision terms. Net-DNFs also promote localized decisions that are taken over small subsets of the features. We present extensive experiments showing that Net-DNFs significantly and consistently outperform fully connected networks over tabular data. With relatively few hyperparameters, Net-DNFs open the door to practical end-to-end handling of tabular data using neural networks. We present ablation studies, which justify the design choices of Net-DNF including the inductive bias elements, namely, Boolean formulation, locality, and feature selection.

[1]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[2]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[3]  Wyeth W. Wasserman,et al.  Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters , 2015, RECOMB.

[4]  Sergei Popov,et al.  Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data , 2019, ICLR.

[5]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[6]  Mihaela van der Schaar,et al.  INVASE: Instance-wise Variable Selection using Neural Networks , 2018, ICLR.

[7]  Martin Anthony,et al.  Connections between neural networks and boolean functions , 2005 .

[8]  Ran El-Yaniv,et al.  Localized Boosting , 2000, COLT.

[9]  Parul Parashar,et al.  Neural Networks in Machine Learning , 2014 .

[10]  Sercan O. Arik,et al.  TabNet: Attentive Interpretable Tabular Learning , 2019, AAAI.

[11]  Tie-Yan Liu,et al.  TabNN: A Universal Neural Network Solution for Tabular Data , 2018 .

[12]  Ji Feng,et al.  Multi-Layered Gradient Boosting Decision Trees , 2018, NeurIPS.

[13]  Hans Ulrich Simon,et al.  On the number of examples and stages needed for learning decision trees , 1990, COLT '90.

[14]  Robert A. Jacobs,et al.  Bias/Variance Analyses of Mixtures-of-Experts Architectures , 1997, Neural Computation.

[15]  Le Song,et al.  Learning to Explain: An Information-Theoretic Perspective on Model Interpretation , 2018, ICML.

[16]  Tolga Tasdizen,et al.  Disjunctive normal random forests , 2015, Pattern Recognit..

[17]  Stephen Tyree,et al.  Parallel boosted regression trees for web search ranking , 2011, WWW.

[18]  Ran El-Yaniv,et al.  Variance Optimized Bagging , 2002, ECML.

[19]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[20]  Eran Segal,et al.  Regularization Learning Networks , 2018, NeurIPS.

[21]  Jiawei Jiang,et al.  An Experimental Evaluation of Large Scale GBDT Systems , 2019, Proc. VLDB Endow..

[22]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[23]  Anna Veronika Dorogush,et al.  CatBoost: unbiased boosting with categorical features , 2017, NeurIPS.

[24]  Henrik Boström,et al.  Block-distributed Gradient Boosted Trees , 2019, SIGIR.

[25]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[26]  Zhaohui Zheng,et al.  Stochastic gradient boosted distributed decision trees , 2009, CIKM.

[27]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[28]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[29]  Yongxin Yang,et al.  Deep Neural Decision Trees , 2018, ArXiv.

[30]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .