论文信息 - Model-Attentive Ensemble Learning for Sequence Modeling

Model-Attentive Ensemble Learning for Sequence Modeling

Medical time-series datasets have unique characteristics that make prediction tasks challenging. Most notably, patient trajectories often contain longitudinal variations in their input-output relationships, generally referred to as temporal conditional shift. Designing sequence models capable of adapting to such time-varying distributions remains a prevailing problem. To address this we present Model-Attentive Ensemble learning for Sequence modeling (MAES). MAES is a mixture of time-series experts which leverages an attention-based gating mechanism to specialize the experts on different sequence dynamics and adaptively weight their predictions. We demonstrate that MAES significantly out-performs popular sequence models on datasets subject to temporal shift.

Mihaela van der Schaar | Ioana Bica | Victor D. Bourgin

[1] Quoc V. Le,et al. HyperNetworks , 2016, ICLR.

[2] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[3] Romain Pirracchio,et al. Mortality Prediction in the ICU Based on MIMIC-II Results from the Super ICU Learner Algorithm (SICULA) Project , 2016 .

[4] Heiko Paulheim,et al. Ensembles of Recurrent Neural Networks for Robust Time Series Forecasting , 2017, SGAI Conf..

[5] Mihaela van der Schaar,et al. Stepwise Model Selection for Sequence Prediction via Deep Kernel Learning , 2020, AISTATS.

[6] Luís Torgo,et al. Arbitrage of forecasting experts , 2018, Machine Learning.

[7] Roger G. Mark,et al. Reproducibility in critical care: a mortality prediction case study , 2017, MLHC.

[8] Jenna Wiens,et al. Patient Risk Stratification with Time-Varying Parameters: A Multitask Learning Approach , 2016, J. Mach. Learn. Res..

[9] Lior Rokach,et al. Ensemble learning: A survey , 2018, WIREs Data Mining Knowl. Discov..

[10] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[11] Vladlen Koltun,et al. Exploring Self-Attention for Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Pablo Barceló,et al. On the Turing Completeness of Modern Neural Network Architectures , 2019, ICLR.

[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14] Walter Karlen,et al. Granger-causal Attentive Mixtures of Experts , 2018, ArXiv.

[15] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[16] Shih-Fu Chang,et al. CDSA: Cross-Dimensional Self-Attention for Multivariate, Geo-tagged Time Series Imputation , 2019, ArXiv.

[17] David H. Wolpert,et al. Stacked generalization , 1992, Neural Networks.

[18] Bumshik Lee,et al. Combining LSTM Network Ensemble via Adaptive Weighting for Improved Time Series Forecasting , 2018, Mathematical Problems in Engineering.

[19] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20] Jeffrey Dean,et al. Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.