WILDS: A Benchmark of in-the-Wild Distribution Shifts

Distribution shifts can cause significant degradation in a broad range of machine learning (ML) systems deployed in the wild. However, many widely-used datasets in the ML community today were not designed for evaluating distribution shifts. These datasets typically have training and test sets drawn from the same distribution, and prior work on retrofitting them with distribution shifts has generally relied on artificial shifts that need not represent the kinds of shifts encountered in the wild. In this paper, we present WILDS, a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, from tumor identification to wildlife monitoring to poverty mapping. WILDS builds on top of recent data collection efforts by domain experts in these applications and provides a unified collection of datasets with evaluation metrics and train/test splits that are representative of real-world distribution shifts. These datasets reflect distribution shifts arising from training and testing on different hospitals, cameras, countries, time periods, demographics, molecular scaffolds, etc., all of which cause substantial performance drops in our baseline models. Finally, we survey other applications that would be promising additions to the benchmark but for which we did not manage to find appropriate datasets; we discuss their associated challenges and detail datasets and shifts where we did not see an appreciable performance drop. By unifying datasets from a variety of application areas and making them accessible to the ML community, we hope to encourage the development of general-purpose methods that are anchored to real-world distribution shifts and that work well across different applications and problem settings. Data loaders, default models, and leaderboards are available at this https URL.

[1]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2]  K Fidelis,et al.  A large‐scale experiment to assess protein structure prediction methods , 1995, Proteins.

[3]  Mario Vento,et al.  A method for improving classification reliability of multilayer perceptrons , 1995, IEEE Trans. Neural Networks.

[4]  W. Guida,et al.  The art and practice of structure‐based drug design: A molecular modeling perspective , 1996, Medicinal research reviews.

[5]  J. Broach,et al.  High-throughput screening for drug discovery. , 1996, Nature.

[6]  Markus Voelter,et al.  State of the Art , 1997, Pediatric Research.

[7]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[8]  T. Tuschl,et al.  RNA Interference and Small Interfering RNAs , 2001, Chembiochem : a European journal of chemical biology.

[9]  Marco Saerens,et al.  Adjusting the Outputs of a Classifier to New a Priori Probabilities: A Simple Procedure , 2002, Neural Computation.

[10]  D. Sahn,et al.  Exploring Alternative Measures of Welfare in the Absence of Expenditure Data , 2003 .

[11]  Brian K. Shoichet,et al.  Virtual screening of chemical libraries , 2004, Nature.

[12]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[13]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[14]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[15]  David J. Hand,et al.  Classifier Technology and the Illusion of Progress , 2006, math/0606441.

[16]  N. Perrimon,et al.  High-throughput RNAi screening in cultured cells: a user's guide , 2006, Nature Reviews Genetics.

[17]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[18]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[19]  A. Gelman,et al.  An Analysis of the New York City Police Department's “Stop-and-Frisk” Policy in the Context of Claims of Racial Bias , 2007 .

[20]  Peter L. Bartlett,et al.  Classification with a Reject Option using a Hinge Loss , 2008, J. Mach. Learn. Res..

[21]  Yishay Mansour,et al.  Domain Adaptation with Multiple Sources , 2008, NIPS.

[22]  A. Tatem,et al.  Using remotely sensed night-time light as a proxy for poverty in Africa , 2008, Population health metrics.

[23]  Romain Robbes,et al.  How Program History Can Improve Code Completion , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[24]  Mira Mezini,et al.  Learning from examples to improve code completion systems , 2009, ESEC/SIGSOFT FSE.

[25]  A. Madabhushi,et al.  Histopathological Image Analysis: A Review , 2009, IEEE Reviews in Biomedical Engineering.

[26]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[27]  J. S. Marron,et al.  A method for normalizing histology slides for quantitative analysis , 2009, 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[28]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[29]  Budhendra L. Bhaduri,et al.  A global poverty map derived from satellite data , 2009, Comput. Geosci..

[30]  David Notkin,et al.  Using twinning to adapt programs to alternative APIs , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[31]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[32]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[33]  Lorenzo Bruzzone,et al.  Domain Adaptation Problems: A DASVM Classification Technique and a Circular Validation Strategy , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Shawn D. Newsam,et al.  Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.

[35]  Richard J. Lemke,et al.  The Creation and Validation of the Ohio Risk Assessment System ( ORAS ) , 2010 .

[36]  Qiang Yang,et al.  Cross-domain sentiment classification via spectral feature alignment , 2010, WWW '10.

[37]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[38]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[39]  Andrew H. Beck,et al.  Systematic Analysis of Breast Cancer Morphology Uncovers Stromal Features Associated with Survival , 2011, Science Translational Medicine.

[40]  Michel C. Desmarais,et al.  A review of recent advances in learner and skill modeling in intelligent learning environments , 2012, User Modeling and User-Adapted Interaction.

[41]  Gilles Blanchard,et al.  Generalizing from Several Related Classification Tasks to a New Unlabeled Sample , 2011, NIPS.

[42]  S. Rees,et al.  Principles of early drug discovery , 2011, British journal of pharmacology.

[43]  D. Bojanic,et al.  Impact of high-throughput screening in biomedical research , 2011, Nature Reviews Drug Discovery.

[44]  D. Swinney,et al.  How were new medicines discovered? , 2011, Nature Reviews Drug Discovery.

[45]  D. Filmer,et al.  Assessing Asset Indices , 2008, Demography.

[46]  Kerrie A. Pipal,et al.  Estimating Escapement for a Low-Abundance Steelhead Population Using Dual-Frequency Identification Sonar (DIDSON) , 2012 .

[47]  Jeffrey T Leek,et al.  Statistical Applications in Genetics and Molecular Biology The practical effect of batch on genomic prediction , 2012 .

[48]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  K. Dill,et al.  The Protein-Folding Problem, 50 Years On , 2012, Science.

[50]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[51]  Hongbing Shen,et al.  The influence of race and ethnicity on the biology of cancer , 2012, Nature Reviews Cancer.

[52]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[53]  Anne E Carpenter,et al.  Annotated high-throughput microscopy image sets for validation , 2012, Nature Methods.

[54]  Ye Xu,et al.  Unbiased Metric Learning: On the Utilization of Multiple Datasets and Web Images for Softening Bias , 2013, 2013 IEEE International Conference on Computer Vision.

[55]  Ruili Huang,et al.  The Tox21 robotic platform for the assessment of environmental chemicals--from vision to reality. , 2013, Drug discovery today.

[56]  C. Justice,et al.  High-Resolution Global Maps of 21st-Century Forest Cover Change , 2013, Science.

[57]  Zhenghao Chen,et al.  Tuned Models of Peer Assessment in MOOCs , 2013, EDM.

[58]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[59]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[60]  Bernhard Schölkopf,et al.  Domain Generalization via Invariant Feature Representation , 2013, ICML.

[61]  Justin Cheng,et al.  Peer and self assessment in massive online classes , 2013, ACM Trans. Comput. Hum. Interact..

[62]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[63]  Trevor Darrell,et al.  Deep Domain Confusion: Maximizing for Domain Invariance , 2014, CVPR 2014.

[64]  Eran Yahav,et al.  Code completion with statistical language models , 2014, PLDI.

[65]  Michael S. Bernstein,et al.  Scaling short-answer grading by combining peer assessment with algorithmic scoring , 2014, L@S.

[66]  William Stafford Noble,et al.  Comparative analysis of metazoan chromatin , 2014 .

[67]  Mark D. Shermis,et al.  State-of-the-art automated essay scoring: Competition, results, and future directions from a United States demonstration , 2014 .

[68]  Les Perelman,et al.  When “the state of the art” is counting words , 2014 .

[69]  Shiyou Zhu,et al.  High-throughput screening of a CRISPR/Cas9 library for functional genomics in human cells , 2014, Nature.

[70]  Kush R. Varshney,et al.  Targeting direct cash transfers to the extremely poor , 2014, KDD.

[71]  David A. Clifton,et al.  A review of novelty detection , 2014, Signal Process..

[72]  Charlotte Soneson,et al.  Batch Effect Confounding Leads to Strong Bias in Performance Estimates Obtained by Cross-Validation , 2014, PloS one.

[73]  Jure Leskovec,et al.  Exploiting Social Network Structure for Person-to-Person Sentiment Analysis , 2014, TACL.

[74]  Moritz Herrmann,et al.  Comparative analysis of metazoan chromatin organization , 2014, Nature.

[75]  A. Becke Perspective: Fifty years of density-functional theory in chemical physics. , 2014, The Journal of chemical physics.

[76]  Marie Persson,et al.  Improved concept drift handling in surgery prediction and other applications , 2015, Knowledge and Information Systems.

[77]  Raymond Y. K. Lau,et al.  Social analytics: Learning fuzzy product ontologies for aspect-oriented sentiment analysis , 2014, Decis. Support Syst..

[78]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[79]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[80]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[82]  N. Meinshausen,et al.  Maximin effects in inhomogeneous large-scale data , 2014, 1406.0596.

[83]  Gabriel Cadamuro,et al.  Predicting poverty and wealth from mobile phone metadata , 2015, Science.

[84]  Mira Mezini,et al.  Intelligent Code Completion with Bayesian Networks , 2015, ACM Trans. Softw. Eng. Methodol..

[85]  Justin M. Rao,et al.  Precinct or Prejudice? Understanding Racial Disparities in New York City's Stop-and-Frisk Policy , 2015 .

[86]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[87]  Raymond Lister,et al.  Exploring Machine Learning Methods to Automatically Identify Students in Need of Assistance , 2015, ICER.

[88]  Dirk Hovy,et al.  Challenges of studying and processing dialects in social media , 2015, NUT@IJCNLP.

[89]  William Stafford Noble,et al.  Machine learning applications in genetics and genomics , 2015, Nature Reviews Genetics.

[90]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[91]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[92]  M. Boutros,et al.  Microscopy-Based High-Content Screening , 2015, Cell.

[93]  Premkumar T. Devanbu,et al.  CACHECA: A Cache Language Model Based Code Suggestion Tool , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[94]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[95]  Charles A. Sutton,et al.  Suggesting accurate method and class names , 2015, ESEC/SIGSOFT FSE.

[96]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[97]  Anh Tuan Nguyen,et al.  Graph-Based Statistical Language Model for Code , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[98]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[99]  Kate Saenko,et al.  Deep CORAL: Correlation Alignment for Deep Domain Adaptation , 2016, ECCV Workshops.

[100]  David R. Kelley,et al.  Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks , 2015, bioRxiv.

[101]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[102]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[103]  Neil T. Heffernan,et al.  AXIS: Generating Explanations at Scale with Learnersourcing and Machine Learning , 2016, L@S.

[104]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[105]  Anne E Carpenter,et al.  Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes , 2016, Nature Protocols.

[106]  Nithya Rajan,et al.  Unmanned Aerial Vehicles for High-Throughput Phenotyping and Agronomic Research , 2016, PloS one.

[107]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[108]  Sang Michael Xie,et al.  Combining satellite imagery and machine learning to predict poverty , 2016, Science.

[109]  Martin T. Vechev,et al.  Probabilistic model for code with decision trees , 2016, OOPSLA.

[110]  Dirk Hovy,et al.  The Social Impact of Natural Language Processing , 2016, ACL.

[111]  Brendan T. O'Connor,et al.  Demographic Dialectal Variation in Social Media: A Case Study of African-American English , 2016, EMNLP.

[112]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[113]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[114]  E. Hovig,et al.  Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses , 2015, Biostatistics.

[115]  Stefano Ermon,et al.  Transfer Learning from Deep Features for Remote Sensing and Poverty Mapping , 2015, AAAI.

[116]  Mitko Veta,et al.  Mitosis Counting in Breast Cancer: Object-Level Interobserver Agreement and Comparison to an Automatic Method , 2016, PloS one.

[117]  Mira Mezini,et al.  Evaluating the evaluations of code recommender systems: A reality check , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[118]  Hwee Tou Ng,et al.  A Neural Approach to Automated Essay Scoring , 2016, EMNLP.

[119]  M. Burke,et al.  Sources of variation in under-5 mortality across sub-Saharan Africa: a spatial analysis. , 2016, The Lancet. Global health.

[120]  Kate Saenko,et al.  Return of Frustratingly Easy Domain Adaptation , 2015, AAAI.

[121]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[122]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[123]  Stephan J. Garbin,et al.  Harmonic Networks: Deep Translation and Rotation Equivariance , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[124]  Lassi Paavolainen,et al.  Data-analysis strategies for image-based cell profiling , 2017, Nature Methods.

[125]  Jiaying Liu,et al.  Revisiting Batch Normalization For Practical Domain Adaptation , 2016, ICLR.

[126]  Jason Weston,et al.  Learning Through Dialogue Interactions , 2016, ICLR.

[127]  Brian A. Malloy,et al.  Quantifying the Transition from Python 2 to 3: An Empirical Study of Python Applications , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[128]  Ran El-Yaniv,et al.  Selective Classification for Deep Neural Networks , 2017, NIPS.

[129]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[130]  Jason Weston,et al.  Dialogue Learning With Human-In-The-Loop , 2016, ICLR.

[131]  Dorit Merhof,et al.  Context-Based Normalization of Histological Stains Using Deep Convolutional Features , 2017, DLMIA/ML-CDS@MICCAI.

[132]  T. Mockler,et al.  High throughput phenotyping to accelerate crop breeding and monitoring of diseases in the field. , 2017, Current opinion in plant biology.

[133]  Marc Berndl,et al.  Improving Phenotypic Measurements in High-Content Imaging Screens , 2017, bioRxiv.

[134]  Tanya Y. Berger-Wolf,et al.  Animal Population Censusing at Scale with Citizen Science and Photographic Identification , 2017, AAAI Spring Symposia.

[135]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[136]  Avi Feller,et al.  Algorithmic Decision Making and the Cost of Fairness , 2017, KDD.

[137]  Brendan T. O'Connor,et al.  Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English , 2017, ArXiv.

[138]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[139]  Marc Brockschmidt,et al.  SmartPaste: Learning to Adapt Source Code , 2017, ArXiv.

[140]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[141]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[142]  Stefano Ermon,et al.  Deep Gaussian Process for Crop Yield Prediction Based on Remote Sensing Data , 2017, AAAI.

[143]  Sang Cheol Kim,et al.  A Robust Deep-Learning-Based Detector for Real-Time Tomato Plant Diseases and Pests Recognition , 2017, Sensors.

[144]  Lina J. Karam,et al.  A Study and Comparison of Human and Deep Learning Recognition Performance under Visual Distortions , 2017, 2017 26th International Conference on Computer Communication and Networks (ICCCN).

[145]  Rachael Tatman,et al.  Gender and Dialect Bias in YouTube’s Automatic Captions , 2017, EthNLP@EACL.

[146]  Aleksey Boyko,et al.  Detecting Cancer Metastases on Gigapixel Pathology Images , 2017, ArXiv.

[147]  D. Sculley,et al.  No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World , 2017, 1711.08536.

[148]  Xianming Liu,et al.  Mapping the world population one building at a time , 2017, ArXiv.

[149]  Yongxin Yang,et al.  Deeper, Broader and Artier Domain Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[150]  Sethuraman Panchanathan,et al.  Deep Hashing Network for Unsupervised Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[151]  Andrew H. Beck,et al.  Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer , 2017, JAMA.

[152]  Min Bai,et al.  TorontoCity: Seeing the World with a Million Eyes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[153]  Daniel Quang,et al.  FactorNet: a deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data , 2017, bioRxiv.

[154]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[155]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[156]  Pouria Sadeghi-Tehran,et al.  Multi-feature machine learning model for automatic segmentation of green fractional vegetation cover for high-throughput field phenotyping , 2017, Plant Methods.

[157]  Guanhua Chen,et al.  Calibration drift in regression and machine learning models for acute kidney injury , 2017, J. Am. Medical Informatics Assoc..

[158]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[159]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[160]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[161]  Krishna P. Gummadi,et al.  From Parity to Preference-based Notions of Fairness in Classification , 2017, NIPS.

[162]  Limsoon Wong,et al.  Why Batch Effects Matter in Omics Data, and How to Avoid Them. , 2017, Trends in biotechnology.

[163]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[164]  John K. Tsotsos,et al.  Elephant in the room , 2018 .

[165]  Ashirbani Saha,et al.  Deep learning for segmentation of brain tumors: Impact of cross‐institutional training and testing , 2018, Medical physics.

[166]  Gang Niu,et al.  Does Distributionally Robust Supervised Learning Give Robust Classifiers? , 2016, ICML.

[167]  Jieyu Zhao,et al.  Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods , 2018, NAACL.

[168]  Evelin Amorim,et al.  Automated Essay Scoring in the Presence of Biased Ratings , 2018, NAACL.

[169]  Percy Liang,et al.  Fairness Without Demographics in Repeated Loss Minimization , 2018, ICML.

[170]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[171]  Pascale Fung,et al.  Reducing Gender Bias in Abusive Language Detection , 2018, EMNLP.

[172]  Seth Neel,et al.  Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness , 2017, ICML.

[173]  Guillaume Lample,et al.  XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[174]  Maria Rigaki,et al.  Bringing a GAN to a Knife-Fight: Adapting Malware Communication to Avoid Detection , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[175]  Ghassan Al-Regib,et al.  CURE-OR: Challenging Unreal and Real Environments for Object Recognition , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[176]  Max Welling,et al.  Rotation Equivariant CNNs for Digital Pathology , 2018, MICCAI.

[177]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[178]  Nancy Fullman,et al.  Mapping local variation in educational attainment across Africa , 2018, Nature.

[179]  Luc Van Gool,et al.  Dark Model Adaptation: Semantic Image Segmentation from Daytime to Nighttime , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[180]  Swami Sankaranarayanan,et al.  MetaReg: Towards Domain Generalization using Meta-Regularization , 2018, NeurIPS.

[181]  Hervé Glotin,et al.  Automatic acoustic detection of birds through deep learning: The first Bird Audio Detection challenge , 2018, Methods in Ecology and Evolution.

[182]  Lucy Vasserman,et al.  Measuring and Mitigating Unintended Bias in Text Classification , 2018, AIES.

[183]  David L. Smith,et al.  Mapping child growth failure in Africa between 2000 and 2015 , 2018, Nature.

[184]  Esther Rolf,et al.  Delayed Impact of Fair Machine Learning , 2018, ICML.

[185]  Gabrielle Berman,et al.  Ethical Considerations when Using Geospatial Technologies for Evidence Generation , 2018, Innocenti Research Briefs.

[186]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[187]  Matthias Bethge,et al.  Generalisation in humans and deep neural networks , 2018, NeurIPS.

[188]  Michael A. Tabak,et al.  Machine learning to classify animal species in camera trap images: applications in ecology , 2018, bioRxiv.

[189]  Shubhra Aich,et al.  DeepWheat: Estimating Phenotypic Traits from Crop Images with Deep Learning , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[190]  Z. Katona,et al.  On the Capital Market Consequences of Alternative Data: Evidence from Outer Space , 2018 .

[191]  Daisuke Komura,et al.  Machine Learning Methods for Histopathological Image Analysis , 2017, Computational and structural biotechnology journal.

[192]  N. Handegard,et al.  A method to automatically detect fish aggregations using horizontally scanning sonar , 2018 .

[193]  Alexander J. Smola,et al.  Detecting and Correcting for Label Shift with Black Box Predictors , 2018, ICML.

[194]  J. Keilwagen,et al.  Accurate prediction of cell type-specific transcription factor binding , 2019, Genome Biology.

[195]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[196]  Joon Son Chung,et al.  VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.

[197]  Abhinav Gupta,et al.  Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias , 2018, NeurIPS.

[198]  Stefano Ermon,et al.  Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance , 2018, NeurIPS.

[199]  Sebastian Caldas,et al.  LEAF: A Benchmark for Federated Settings , 2018, ArXiv.

[200]  Hany Farid,et al.  The accuracy, fairness, and limits of predicting recidivism , 2018, Science Advances.

[201]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[202]  Dhruv Batra,et al.  Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[203]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[204]  David L. Smith,et al.  Local variation in childhood diarrheal morbidity and mortality in Africa, 2000-2015 , 2018, The New England journal of medicine.

[205]  Emily M. Bender,et al.  Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science , 2018, TACL.

[206]  John C. Duchi,et al.  Learning Models with Uniform Performance via Distributionally Robust Optimization , 2018, ArXiv.

[207]  Gordon Christie,et al.  Functional Map of the World , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[208]  D. Tao,et al.  Deep Domain Generalization via Conditional Invariant Adversarial Networks , 2018, ECCV.

[209]  Ghassan Hamarneh,et al.  Adversarial Stain Transfer for Histopathology Image Analysis , 2018, IEEE Transactions on Medical Imaging.

[210]  Joseph Gomes,et al.  MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a , 2017, Chemical science.

[211]  Sharad Goel,et al.  The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning , 2018, ArXiv.

[212]  Yongxin Yang,et al.  Learning to Generalize: Meta-Learning for Domain Generalization , 2017, AAAI.

[213]  Jian Shen,et al.  Wasserstein Distance Guided Representation Learning for Domain Adaptation , 2017, AAAI.

[214]  Alex ChiChung Kot,et al.  Domain Generalization with Adversarial Feature Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[215]  Xian Zhang,et al.  Unsupervised phenotypic analysis of cellular images with multi-scale convolutional neural networks , 2018, bioRxiv.

[216]  Silvio Savarese,et al.  Generalizing to Unseen Domains via Adversarial Data Augmentation , 2018, NeurIPS.

[217]  Regina Barzilay,et al.  Multi-Source Domain Adaptation with Mixture of Experts , 2018, EMNLP.

[218]  Kate Saenko,et al.  VisDA: A Synthetic-to-Real Benchmark for Visual Domain Adaptation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[219]  Nico Karssemeijer,et al.  Whole-Slide Mitosis Detection in H&E Breast Histology Using PHH3 as a Reference to Train Distilled Stain-Invariant Convolutional Networks , 2018, IEEE Transactions on Medical Imaging.

[220]  Alex Bewley,et al.  Incremental Adversarial Domain Adaptation for Continually Changing Environments , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[221]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[222]  Kelly R. Thorp,et al.  High-Throughput Phenotyping of Crop Water Use Efficiency via Multispectral Drone Imagery and a Daily Soil Water Balance Model , 2018, Remote. Sens..

[223]  Matthew J. Hausknecht,et al.  Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis , 2018, ICLR.

[224]  Maoguo Gong,et al.  Automatic Tobacco Plant Detection in UAV Images via Deep Neural Networks , 2018, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[225]  Ben. G. Weinstein A computer vision for animal ecology. , 2018, The Journal of animal ecology.

[226]  Marcus A. Badgeley,et al.  Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study , 2018, PLoS medicine.

[227]  David Sontag,et al.  Why Is My Classifier Discriminatory? , 2018, NeurIPS.

[228]  Marcus A. Badgeley,et al.  Deep learning predicts hip fracture using confounding patient and healthcare variables , 2018, npj Digital Medicine.

[229]  Jason Yosinski,et al.  R X R X 1: A N IMAGE SET FOR CELLULAR MORPHOLOGICAL VARIATION ACROSS MANY EXPERIMENTAL BATCHES , 2019 .

[230]  Yoav Goldberg,et al.  Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets , 2019, EMNLP.

[231]  Ran El-Yaniv,et al.  SelectiveNet: A Deep Neural Network with an Integrated Reject Option , 2019, ICML.

[232]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[233]  Y. Guan,et al.  Anchor: trans-cell type prediction of transcription factor binding sites , 2018, Genome research.

[234]  Percy Liang,et al.  SPoC: Search-based Pseudocode to Code , 2019, NeurIPS.

[235]  Shaoqun Zeng,et al.  From Detection of Individual Metastases to Classification of Lymph Node Status at the Patient Level: The CAMELYON17 Challenge , 2019, IEEE Transactions on Medical Imaging.

[236]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[237]  Noel C. F. Codella,et al.  Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC) , 2019, ArXiv.

[238]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[239]  Hongyang Li,et al.  Fast decoding cell type–specific transcription factor binding landscape at single-nucleotide resolution , 2019, bioRxiv.

[240]  Frédéric Baret,et al.  Ear density estimation from high resolution RGB imagery using deep learning technique , 2019, Agricultural and Forest Meteorology.

[241]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[242]  Bo Wang,et al.  Moment Matching for Multi-Source Domain Adaptation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[243]  Judy Hoffman,et al.  Predictive Inequity in Object Detection , 2019, ArXiv.

[244]  Geert J. S. Litjens,et al.  Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology , 2019, Medical Image Anal..

[245]  Tarak Shah,et al.  MEASURES OF FAIRNESS FOR NEW YORK CITY’S SUPERVISED RELEASE RISK ASSESSMENT TOOL , 2019 .

[246]  Zoe Wilson,et al.  Yielding to the image: How phenotyping reproductive growth can assist crop improvement and production. , 2019, Plant science : an international journal of experimental plant biology.

[247]  M. Ghassemi,et al.  Can AI Help Reduce Disparities in General Medical and Mental Health Care? , 2019, AMA journal of ethics.

[248]  Jianmo Ni,et al.  Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects , 2019, EMNLP.

[249]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[250]  Eric P. Xing,et al.  Learning Robust Global Representations by Penalizing Local Predictive Power , 2019, NeurIPS.

[251]  Benjamin Recht,et al.  Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[252]  Brian W. Powers,et al.  Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.

[253]  Arjun Sondhi,et al.  Selective prediction-set models with coverage guarantees , 2019, ArXiv.

[254]  Ron Kimmel,et al.  Data Augmentation for Leaf Segmentation and Counting Tasks in Rosette Plants , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[255]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[256]  Aditya Kanade,et al.  Neural Program Repair by Jointly Learning to Localize and Repair , 2019, ICLR.

[257]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[258]  Yifan Wu,et al.  Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment , 2019, ICML.

[259]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[260]  Hao Lu,et al.  TasselNetv2: in-field counting of wheat spikes with context-augmented local regression networks , 2019, Plant Methods.

[261]  Boris Katz,et al.  ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models , 2019, NeurIPS.

[262]  John C. Duchi,et al.  Distributionally Robust Losses Against Mixture Covariate Shifts , 2019 .

[263]  Kamyar Azizzadenesheli,et al.  Regularized Learning for Domain Adaptation under Label Shifts , 2019, ICLR.

[264]  Yurii S. Moroz,et al.  Ultra-large library docking for discovering new chemotypes , 2019, Nature.

[265]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[266]  Percy Liang,et al.  Distributionally Robust Language Modeling , 2019, EMNLP.

[267]  Dan Morris,et al.  Efficient Pipeline for Camera Trap Image Review , 2019, ArXiv.

[268]  Neel Sundaresan,et al.  Pythia: AI-assisted Code Completion System , 2019, KDD.

[269]  Fabio Maria Carlucci,et al.  Domain Generalization by Solving Jigsaw Puzzles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[270]  G. Wainrib,et al.  Deep learning-based classification of mesothelioma improves prediction of patient outcome , 2019, Nature Medicine.

[271]  Shila Ghazanfar,et al.  The human body at cellular resolution: the NIH Human Biomolecular Atlas Program , 2019, Nature.

[272]  David G. Knowles,et al.  Predicting Splicing from Primary Sequence with Deep Learning , 2019, Cell.

[273]  Bernt Schiele,et al.  Not Using the Car to See the Sidewalk — Quantifying and Controlling the Effects of Context in Classification and Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[274]  Yejin Choi,et al.  The Risk of Racial Bias in Hate Speech Detection , 2019, ACL.

[275]  Fumio Okura,et al.  How Convolutional Neural Networks Diagnose Plant Disease , 2019, Plant phenomics.

[276]  Fabian J Theis,et al.  Deep learning: new computational modelling techniques for genomics , 2019, Nature Reviews Genetics.

[277]  Walter Jetz,et al.  Wildlife Insights: A Platform to Maximize the Potential of Camera Trap and Other Passive Sensor Wildlife Data for the Planet , 2019, Environmental Conservation.

[278]  W. Price,et al.  Privacy in the age of medical big data , 2019, Nature Medicine.

[279]  Daniel C. Castro,et al.  Domain Generalization via Model-Agnostic Learning of Semantic Features , 2019, NeurIPS.

[280]  Rishabh Singh,et al.  Synthetic Datasets for Neural Program Synthesis , 2019, ICLR.

[281]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[282]  Lucy Vasserman,et al.  Limitations of Pinned AUC for Measuring Unintended Bias , 2019, ArXiv.

[283]  Imran Shah,et al.  Considerations for Strategic Use of High-Throughput Transcriptomics Chemical Screening Data in Regulatory Decisions. , 2019, Current opinion in toxicology.

[284]  Thomas J. Fuchs,et al.  Clinical-grade computational pathology using weakly supervised deep learning on whole slide images , 2019, Nature Medicine.

[285]  Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , 2019, ArXiv.

[286]  Timo Baumann,et al.  The Spoken Wikipedia Corpus collection: Harvesting, alignment and an application to hyperlistening , 2019, Lang. Resour. Evaluation.

[287]  Lucy Vasserman,et al.  Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification , 2019, WWW.

[288]  Zachary C. Lipton,et al.  What is the Effect of Importance Weighting in Deep Learning? , 2018, ICML.

[289]  M. Kuo,et al.  Frequency and Distribution of Chest Radiographic Findings in COVID-19 Positive Patients , 2019, Radiology.

[290]  Charles E. McAnany,et al.  Deep learning at base-resolution reveals motif syntax of the cis-regulatory code , 2019, bioRxiv.

[291]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[292]  Karl Rohr,et al.  Predicting breast tumor proliferation from whole‐slide images: The TUPAC16 challenge , 2018, Medical Image Anal..

[293]  D. Sculley,et al.  The Inclusive Images Competition , 2019 .

[294]  Christopher Ré,et al.  Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices , 2019, NeurIPS.

[295]  Daniel Reker,et al.  Practical considerations for active machine learning in drug discovery. , 2019, Drug discovery today. Technologies.

[296]  Harald C. Gall,et al.  When Code Completion Fails: A Case Study on Real-World Completions , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[297]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[298]  Atil Iscen,et al.  Data Efficient Reinforcement Learning for Legged Robots , 2019, CoRL.

[299]  Omer Levy,et al.  SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[300]  Anna Goldenberg,et al.  Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks , 2019, MLHC.

[301]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[302]  Ran El-Yaniv,et al.  Bias-Reduced Uncertainty Estimation for Deep Neural Classifiers , 2018, ICLR.

[303]  Christopher D. Brown,et al.  The GTEx Consortium atlas of genetic regulatory effects across human tissues , 2019, Science.

[304]  Mike Wu,et al.  Zero Shot Learning for Code Education: Rubric Sampling with Deep Learning Inference , 2018, AAAI.

[305]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[306]  Shaun Mahony,et al.  Sequence and chromatin determinants of transcription factor binding and the establishment of cell type-specific binding patterns. , 2020, Biochimica et biophysica acta. Gene regulatory mechanisms.

[307]  Marc Brockschmidt,et al.  CodeSearchNet Challenge: Evaluating the State of Semantic Code Search , 2019, ArXiv.

[308]  Jason Baldridge,et al.  PAWS: Paraphrase Adversaries from Word Scrambling , 2019, NAACL.

[309]  Percy Liang,et al.  Graph-based, Self-Supervised Program Repair from Diagnostic Feedback , 2020, ICML.

[310]  Hady Elsahar,et al.  Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages , 2020, FINDINGS.

[311]  Lauren Wilcox,et al.  A Human-Centered Evaluation of a Deep Learning System Deployed in Clinics for the Detection of Diabetic Retinopathy , 2020, CHI.

[312]  Leo Celi,et al.  Evaluating Progress on Machine Learning for Longitudinal Electronic Healthcare Data , 2020, ArXiv.

[313]  Stefano Ermon,et al.  Learning When and Where to Zoom With Deep Reinforcement Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[314]  Anne E Carpenter,et al.  Functional immune mapping with deep-learning enabled phenomics applied to immunomodulatory and COVID-19 drug discovery , 2020, bioRxiv.

[315]  Dawn Song,et al.  Scaling Out-of-Distribution Detection for Real-World Settings , 2022, ICML.

[316]  Stephan Hoyer,et al.  Correcting nuisance variation using Wasserstein distance , 2017, PeerJ.

[317]  Jongbin Jung,et al.  The limits of human predictions of recidivism , 2020, Science Advances.

[318]  Michael J. Purcaro,et al.  Expanded encyclopaedias of DNA elements in the human and mouse genomes , 2020, Nature.

[319]  S. Levine,et al.  Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, Robotics: Science and Systems.

[320]  M. Bethge,et al.  Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.

[321]  Xiaobai Liu,et al.  Deep neural networks for automated detection of marine mammal species , 2020, Scientific Reports.

[322]  Krishna P. Gummadi,et al.  FairRec: Two-Sided Fairness for Personalized Recommendations in Two-Sided Platforms , 2020, WWW.

[323]  Dan Jurafsky,et al.  Racial disparities in automated speech recognition , 2020, Proceedings of the National Academy of Sciences.

[324]  Pang Wei Koh,et al.  An Investigation of Why Overparameterization Exacerbates Spurious Correlations , 2020, ICML.

[325]  Mohammad Sadegh Norouzzadeh,et al.  A deep active learning system for species identification and counting in camera trap images , 2019, Methods in Ecology and Evolution.

[326]  Nicolas Flammarion,et al.  RobustBench: a standardized adversarial robustness benchmark , 2020, NeurIPS Datasets and Benchmarks.

[327]  Minhajul A. Badhon,et al.  Global Wheat Head Detection (GWHD) Dataset: A Large and Diverse Dataset of High-Resolution RGB-Labelled Images to Develop and Benchmark Wheat Head Detection Methods , 2020, Plant phenomics.

[328]  Dragomir Anguelov,et al.  Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[329]  Benjamin Recht,et al.  The Effect of Natural Distribution Shift on Question Answering Models , 2020, ICML.

[330]  Richard Zemel,et al.  Fairness and Robustness in Invariant Learning: A Case Study in Toxicity Classification , 2020, ArXiv.

[331]  Christopher Ré,et al.  Overton: A Data System for Monitoring and Improving Machine-Learned Products , 2019, CIDR.

[332]  Sivaraman Balakrishnan,et al.  A Unified View of Label Shift Estimation , 2020, NeurIPS.

[333]  Sameer Singh,et al.  Beyond Accuracy: Behavioral Testing of NLP Models with CheckList , 2020, ACL.

[334]  Percy Liang,et al.  Robustness to Spurious Correlations via Human Annotations , 2020, ICML.

[335]  Been Kim,et al.  Concept Bottleneck Models , 2020, ICML.

[336]  H. Kühl,et al.  Listening and watching: Do camera traps or acoustic sensors more efficiently detect wild chimpanzees in an open habitat? , 2020, Methods in Ecology and Evolution.

[337]  Ian Stavness,et al.  Unsupervised Domain Adaptation For Plant Organ Counting , 2020, ECCV Workshops.

[338]  Sara Beery,et al.  The iWildCam 2020 Competition Dataset , 2020, ArXiv.

[339]  R. Thomas McCoy,et al.  BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance , 2019, BLACKBOXNLP.

[340]  Sara Beery,et al.  Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[341]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling , 2018, ArXiv.

[342]  Alexander D'Amour,et al.  Fairness is not static: deeper understanding of long term fairness via simulation studies , 2020, FAT*.

[343]  Benjamin Recht,et al.  Measuring Robustness to Natural Distribution Shifts in Image Classification , 2020, NeurIPS.

[344]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[345]  Avanti Shrikumar,et al.  Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat at Label Shift Adaptation , 2020, ICML.

[346]  C. Lawrence Zitnick,et al.  The Open Catalyst 2020 (OC20) Dataset and Community Challenges , 2020, Proceedings of the International Conference on Electrocatalysis for Energy Applications and Sustainable Chemicals.

[347]  Francis M. Tyers,et al.  Common Voice: A Massively-Multilingual Speech Corpus , 2019, LREC.

[348]  Xia Tian,et al.  Machine learning on DNA-encoded libraries: A new paradigm for hit-finding , 2020, Journal of medicinal chemistry.

[349]  Noah D. Goodman,et al.  Variational Item Response Theory: Fast, Accurate, and Expressive , 2020, EDM.

[350]  Joseph D. Janizek,et al.  AI for radiographic COVID-19 detection selects shortcuts over signal , 2020, Nature Machine Intelligence.

[351]  David B. Lobell,et al.  Weakly Supervised Deep Learning for Segmentation of Remote Sensing Imagery , 2020, Remote. Sens..

[352]  Alexander D'Amour,et al.  Underspecification Presents Challenges for Credibility in Modern Machine Learning , 2020, J. Mach. Learn. Res..

[353]  Sameera S. Ponda,et al.  Autonomous navigation of stratospheric balloons using reinforcement learning , 2020, Nature.

[354]  Hao Tan,et al.  The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions , 2020, EMNLP.

[355]  Eunsol Choi,et al.  TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages , 2020, Transactions of the Association for Computational Linguistics.

[356]  Anne Driscoll,et al.  Using publicly available satellite imagery and deep learning to understand economic well-being in Africa , 2020, Nature Communications.

[357]  Diego H. Milone,et al.  Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis , 2020, Proceedings of the National Academy of Sciences.

[358]  T. Jaakkola,et al.  Enforcing Predictive Invariance across Structured Biomedical Domains , 2020 .

[359]  Christopher Ré,et al.  No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems , 2020, NeurIPS.

[360]  Tengyu Ma,et al.  Understanding Self-Training for Gradual Domain Adaptation , 2020, ICML.

[361]  Ian Stavness,et al.  AutoCount: Unsupervised Segmentation and Counting of Organs in Field Images , 2020, ECCV Workshops.

[362]  Tian Li,et al.  Fair Resource Allocation in Federated Learning , 2019, ICLR.

[363]  Sergey Levine,et al.  Adaptive Risk Minimization: A Meta-Learning Approach for Tackling Group Shift , 2020, ArXiv.

[364]  Dawn Song,et al.  Pretrained Transformers Improve Out-of-Distribution Robustness , 2020, ACL.

[365]  Stefan Schneider,et al.  Counting Fish and Dolphins in Sonar Images Using Deep Learning , 2020, ArXiv.

[366]  Trevor Darrell,et al.  Fully Test-time Adaptation by Entropy Minimization , 2020, ArXiv.

[367]  Michael I. Jordan,et al.  Post-Estimation Smoothing: A Simple Baseline for Learning with Side Information , 2020, AISTATS.

[368]  Alexei A. Efros,et al.  Test-Time Training with Self-Supervision for Generalization under Distribution Shifts , 2019, ICML.

[369]  Finale Doshi-Velez,et al.  The myth of generalisability in clinical research and machine learning in health care , 2020, The Lancet Digital Health.

[370]  David S. Melnick,et al.  International evaluation of an AI system for breast cancer screening , 2020, Nature.

[371]  Tal Linzen,et al.  COGS: A Compositional Generalization Challenge Based on Semantic Interpretation , 2020, EMNLP.

[372]  Zachary Chase Lipton,et al.  Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2019, ICLR.

[373]  Sorelle A. Friedler,et al.  Fairness warnings and fair-MAML: learning fairly with minimal data , 2019, FAT*.

[374]  Orhan Firat,et al.  XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[375]  John Duchi,et al.  Distributionally Robust Losses for Latent Covariate Mixtures , 2020, ArXiv.

[376]  Cyrill Stachniss,et al.  Unsupervised Domain Adaptation for Transferring Plant Classification Systems to New Field Environments, Crops, and Robots , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[377]  Peyman Moghadam,et al.  Scalable learning for bridging the species gap in image-based plant phenotyping , 2020, Comput. Vis. Image Underst..

[378]  Percy Liang,et al.  Selective Question Answering under Domain Shift , 2020, ACL.

[379]  C. Lawrence Zitnick,et al.  An Introduction to Electrocatalyst Design using Machine Learning for Renewable Energy Storage , 2020, ArXiv.

[380]  Suchi Saria,et al.  Evaluating Model Robustness to Dataset Shift , 2020, ArXiv.

[381]  Andrew Y. Ng,et al.  CheXphoto: 10, 000+ Smartphone Photos and Synthetic Photographic Transformations of Chest X-rays for Benchmarking Deep Learning Robustness , 2020, ArXiv.

[382]  S. Chapman,et al.  Breeder friendly phenotyping. , 2020, Plant science : an international journal of experimental plant biology.

[383]  Yulia Tsvetkov,et al.  Fortifying Toxic Speech Detectors Against Veiled Toxicity , 2020, EMNLP.

[384]  R. Almond,et al.  Living Planet Report 2020 - Bending the curve of biodiversity loss , 2020 .

[385]  Kristina Lerman,et al.  A Survey on Bias and Fairness in Machine Learning , 2019, ACM Comput. Surv..

[386]  D. Song,et al.  The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[387]  Alexander D'Amour,et al.  On Robustness and Transferability of Convolutional Neural Networks , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[388]  Timnit Gebru,et al.  Datasheets for datasets , 2018, Commun. ACM.

[389]  Tengyu Ma,et al.  In-N-Out: Pre-Training and Self-Training using Auxiliary Information for Out-of-Distribution Robustness , 2020, ICLR.

[390]  Jasper Snoek,et al.  Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures , 2020, Entropy.

[391]  Michael J Taylor,et al.  Spatially Resolved Mass Spectrometry at the Single Cell: Recent Innovations in Proteomics and Metabolomics , 2021, Journal of the American Society for Mass Spectrometry.

[392]  Percy Liang,et al.  Selective Classification Can Magnify Disparities Across Groups , 2020, ICLR.

[393]  B. Recht,et al.  Do Image Classifiers Generalize Across Time? , 2019, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[394]  Peng Cui,et al.  Towards Non-I.I.D. image classification: A dataset and baselines , 2019, Pattern Recognit..

[395]  Tengyu Ma,et al.  Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization , 2020, ICLR.

[396]  Marzyeh Ghassemi,et al.  CheXclusion: Fairness gaps in deep chest X-ray classifiers , 2020, PSB.

[397]  Aleksander Madry,et al.  Noise or Signal: The Role of Image Backgrounds in Object Recognition , 2020, ICLR.

[398]  David Lopez-Paz,et al.  In Search of Lost Domain Generalization , 2020, ICLR.

[399]  Aleksander Madry,et al.  BREEDS: Benchmarks for Subpopulation Shift , 2020, ICLR.

[400]  Karan Goel,et al.  Model Patching: Closing the Subgroup Performance Gap with Data Augmentation , 2020, ICLR.

[401]  Christina Heinze-Deml,et al.  Conditional variance penalties and domain shift robustness , 2017, Machine Learning.

[402]  Frédéric Baret,et al.  Global Wheat Head Dataset 2021: an update to improve the benchmarking wheat head localization with more diversity , 2021, ArXiv.

[403]  Marzyeh Ghassemi,et al.  Ethical Machine Learning in Health Care , 2020, Annual review of biomedical data science.

[404]  S. Levine,et al.  BADGR: An Autonomous Self-Supervised Learning-Based Navigation System , 2020, IEEE Robotics and Automation Letters.