Multimodal AutoML on Structured Tables with Text Fields

We design automated supervised learning systems for data tables that not only contain numeric/categorical columns, but text fields as well. Here we assemble 15 multimodal data tables that each contain some text fields and stem from a real business application. Over this benchmark, we evaluate numerous multimodal AutoML strategies, including a standard two-stage approach where NLP is used to featurize the text such that AutoML for tabular data can then be applied. We propose various practically superior strategies based on multimodal adaptations of Transformer networks and stack ensembling of these networks with classical tabular models. Beyond performing the best in our benchmark, our proposed (fully automated) methodology manages to rank 1st place (against human data scientists) when fit to the raw tabular/text data in two MachineHack prediction competitions and 2nd place (out of 2380 teams) in Kaggle’s Mercari Price Suggestion Challenge.

[1]  Xin Huang,et al.  TabTransformer: Tabular Data Modeling Using Contextual Embeddings , 2020, ArXiv.

[2]  Frank D. Wood,et al.  Ensemble Squared: A Meta AutoML System , 2020, ArXiv.

[3]  Maximilien Kintz,et al.  Leveraging Automated Machine Learning for Text Classification: Evaluation of AutoML Tools and Comparison with Human Performance , 2020, ICAART.

[4]  Alexander J. Smola,et al.  Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation , 2020, NeurIPS.

[5]  Alexander J. Smola,et al.  TraDE: Transformers for Density Estimation , 2020, ArXiv.

[6]  Joseph E. Gonzalez,et al.  NBDT: Neural-Backed Decision Trees , 2020, ArXiv.

[7]  Aaron Klein,et al.  Model-based Asynchronous Hyperparameter Optimization , 2020, ArXiv.

[8]  Quoc V. Le,et al.  ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[9]  Hang Zhang,et al.  AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data , 2020, ArXiv.

[10]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[11]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[12]  Reza Farivar,et al.  Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools , 2019, 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI).

[13]  Kaiyong Zhao,et al.  AutoML: A Survey of the State-of-the-Art , 2019, Knowl. Based Syst..

[14]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[15]  Tie-Yan Liu,et al.  DeepGBM: A Deep Learning Framework Distilled by GBDT for Online Prediction Tasks , 2019, KDD.

[16]  Bernd Bischl,et al.  An Open Source AutoML Benchmark , 2019, ArXiv.

[17]  Xuanjing Huang,et al.  How to Fine-Tune BERT for Text Classification? , 2019, CCL.

[18]  Omer Levy,et al.  SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[19]  Marco F. Huber,et al.  Benchmark and Survey of Automated Machine Learning Frameworks , 2019, J. Artif. Intell. Res..

[20]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[21]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[22]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[23]  Anna Veronika Dorogush,et al.  CatBoost: unbiased boosting with categorical features , 2017, NeurIPS.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[26]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[27]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[28]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[29]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[30]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[31]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artif. Intell..

[32]  Lisa Dunlap,et al.  NBDT: Neural-Backed Decision Tree , 2021, ICLR.

[33]  E. LeDell,et al.  H2O AutoML: Scalable Automatic Machine Learning , 2020 .

[34]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[35]  Lars Kotthoff,et al.  Automated Machine Learning: Methods, Systems, Challenges , 2019, The Springer Series on Challenges in Machine Learning.

[36]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.