Shifts 2.0: Extending The Dataset of Real Distributional Shifts

Distributional shift, or the mismatch between training and deployment data, is a significant obstacle to the usage of machine learning in high-stakes industrial applications, such as autonomous driving and medicine. This creates a need to be able to assess how robustly ML models generalize as well as the quality of their uncertainty estimates. Standard ML baseline datasets do not allow these properties to be assessed, as the training, validation and test data are often identically distributed. Recently, a range of dedicated benchmarks have appeared, featuring both distributionally matched and shifted data. Among these benchmarks, the Shifts dataset stands out in terms of the diversity of tasks as well as the data modalities it features. While most of the benchmarks are heavily dominated by 2D image classification tasks, Shifts contains tabular weather forecasting, machine translation, and vehicle motion prediction tasks. This enables the robustness properties of models to be assessed on a diverse set of industrial-scale tasks and either universal or directly applicable task-specific conclusions to be reached. In this paper, we extend the Shifts Dataset [1] with two datasets sourced from industrial, high-risk applications of high societal importance. Specifically, we consider the tasks of segmentation of white matter Multiple Sclerosis lesions in 3D magnetic resonance brain images and the estimation of power consumption in marine cargo vessels. Both tasks feature ubiquitous distributional shifts and a strict safety requirement due to the high cost of errors. These new datasets will allow researchers to further explore robust generalization and uncertainty estimation in new

[1]  Tatsunori B. Hashimoto,et al.  Extending the WILDS Benchmark for Unsupervised Adaptation , 2021, ICLR.

[2]  Daguang Xu,et al.  UNETR: Transformers for 3D Medical Image Segmentation , 2021, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[3]  Raghav Mehta,et al.  Propagating Uncertainty Across Cascaded Medical Imaging Tasks for Improved Deep Learning Inference , 2019, IEEE Transactions on Medical Imaging.

[4]  Y. Gal,et al.  Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks , 2021, NeurIPS Datasets and Benchmarks.

[5]  O. Ciccarelli,et al.  2021 MAGNIMS–CMSC–NAIMS consensus recommendations on the use of MRI in patients with multiple sclerosis , 2021, The Lancet Neurology.

[6]  Michael W. Dusenberry,et al.  Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning , 2021, ArXiv.

[7]  Andrey Malinin,et al.  Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets , 2021, NeurIPS.

[8]  Mark J. F. Gales,et al.  Uncertainty Estimation in Autoregressive Structured Prediction , 2021, ICLR.

[9]  Bjoern H Menze,et al.  Common Limitations of Image Processing Metrics: A Picture Story , 2021, ArXiv.

[10]  Pang Wei Koh,et al.  WILDS: A Benchmark of in-the-Wild Distribution Shifts , 2020, ICML.

[11]  Loizos Michael,et al.  Neural-Symbolic Integration: A Compositional Perspective , 2020, AAAI.

[12]  Jasper Snoek,et al.  Training independent subnetworks for robust prediction , 2020, ICLR.

[13]  D. Song,et al.  The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Dawn Song,et al.  Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Timnit Gebru,et al.  Datasheets for datasets , 2018, Commun. ACM.

[16]  Zhenzhong Liu,et al.  Review of Deep Learning Approaches for the Segmentation of Multiple Sclerosis Lesions on Brain MRI , 2020, Frontiers in Neuroinformatics.

[17]  Tie-shan Li,et al.  Predicting Ship Fuel Consumption based on LSTM Neural Network , 2020, 2020 7th International Conference on Information, Cybernetics, and Computational Social Systems (ICCSS).

[18]  E. Leray,et al.  Rising prevalence of multiple sclerosis worldwide: Insights from the Atlas of MS, third edition , 2020, Multiple sclerosis.

[19]  Nastia Degiuli,et al.  Impact of Hard Fouling on the Ship Performance of Different Ship Forms , 2020, Journal of Marine Science and Engineering.

[20]  Mário João Fartaria,et al.  Multiple sclerosis cortical and WM lesion segmentation at 3T MRI: a deep learning method based on FLAIR and MP2RAGE , 2020, NeuroImage: Clinical.

[21]  Sergey Levine,et al.  Can Autonomous Vehicles Identify, Recover From, and Adapt to Distribution Shifts? , 2020, ICML.

[22]  Lucia Specia,et al.  Unsupervised Quality Estimation for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[23]  Dmitry Vetrov,et al.  Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning , 2020, ICLR.

[24]  Andrey Malinin,et al.  Ensemble Distribution Distillation , 2019, ICLR.

[25]  José Miguel Hernández-Lobato,et al.  Principled Uncertainty Estimation for High Dimensional Data , 2020 .

[26]  Yarin Gal,et al.  A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks , 2019, ArXiv.

[27]  Andrey Malinin,et al.  Uncertainty estimation in deep learning with application to spoken language assessment , 2019 .

[28]  Gerasimos Theotokatos,et al.  Machine learning models for predicting ship main engine Fuel Oil Consumption: A comparative study , 2019, Ocean Engineering.

[29]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[30]  Yarin Gal,et al.  BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning , 2019, NeurIPS.

[31]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[32]  J. Woxenius,et al.  Sustainable Short Sea Shipping , 2019, Sustainability.

[33]  David Bonekamp,et al.  Automated brain extraction of multisequence MRI using artificial neural networks , 2019, Human brain mapping.

[34]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[35]  E. Boulougouris,et al.  A Study on the Statistical Calibration of the Holtrop and Mennen Approximate Power Prediction Method for Full Hull Form, Low Froude Number Vessels , 2018, Journal of Ship Production and Design.

[36]  Tim Z. Xiao Wat heb je gezegd? Detecting Out-of-Distribution Translations with Variational Transformers , 2019 .

[37]  Graham Neubig,et al.  MTNT: A Testbed for Machine Translation of Noisy Text , 2018, EMNLP.

[38]  Doina Precup,et al.  Exploring Uncertainty Measures in Deep Networks for Multiple Sclerosis Lesion Detection and Segmentation , 2018, MICCAI.

[39]  Martin Styner,et al.  Objective Evaluation of Multiple Sclerosis Lesion Segmentation using a Data Management and Processing Infrastructure , 2018, bioRxiv.

[40]  Dustin Tran,et al.  Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches , 2018, ICLR.

[41]  David H. Miller,et al.  Diagnosis of multiple sclerosis: 2017 revisions of the McDonald criteria , 2017, The Lancet Neurology.

[42]  Peter A. Calabresi,et al.  Longitudinal multiple sclerosis lesion segmentation data resource , 2017, Data in brief.

[43]  Snehashis Roy,et al.  Longitudinal multiple sclerosis lesion segmentation: Resource and challenge , 2017, NeuroImage.

[44]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[45]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[46]  Bostjan Likar,et al.  A Novel Public MR Image Dataset of Multiple Sclerosis Patients With Lesion Segmentations Based on Multi-rater Consensus , 2017, Neuroinformatics.

[47]  Thomas Brox,et al.  3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , 2016, MICCAI.

[48]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[49]  Mehmet Atlar,et al.  A Study on the Hydrodynamic Effect of Biofouling on Marine Propeller , 2016 .

[50]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[51]  F. Barkhof,et al.  Evidence-based guidelines: MAGNIMS consensus guidelines on the use of MRI in multiple sclerosis—clinical implementation in the diagnostic process , 2015, Nature Reviews Neurology.

[52]  Olivier Commowick,et al.  Block-matching strategies for rigid registration of multimodal medical images , 2012, 2012 9th IEEE International Symposium on Biomedical Imaging (ISBI).

[53]  Brian B. Avants,et al.  N4ITK: Improved N3 Bias Correction , 2010, IEEE Transactions on Medical Imaging.

[54]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[55]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[56]  Masaru Tsujimoto,et al.  A Practical Correction Method for Added Resistance in Waves , 2008 .

[57]  Pierrick Coupé,et al.  An Optimized Blockwise Nonlocal Means Denoising Filter for 3-D Magnetic Resonance Images , 2008, IEEE Transactions on Medical Imaging.

[58]  Neil Bose Marine Powering Prediction and Propulsors , 2008 .

[59]  Guido Gerig,et al.  User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability , 2006, NeuroImage.

[60]  Yoshiho Ikeda,et al.  Cruising Performance of a Large Passenger Ship in Heavy Sea , 2006 .

[61]  Volker Bertram,et al.  Practical Ship Hydrodynamics , 2000 .

[62]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[63]  J. Holtrop,et al.  AN APPROXIMATE POWER PREDICTION METHOD , 1982 .

[64]  David F. Rogers,et al.  THE SOCIETY OF NAVAL ARCHITECTS AND MARINE ENGINEERS , 1977 .

[65]  J. D. van Manen,et al.  THE WAGENINGEN B-SCREW SERIES , 1969 .

[66]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[67]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[68]  商船學校 Marine propellers and propulsion , 1913 .