On the Gap between Adoption and Understanding in NLP

There are some issues with current research trends in NLP that can hamper the free development of scientific research. We identify five of particular concern: 1) the early adoption of methods without sufficient understanding or analysis; 2) the preference for computational methods regardless of risks associated with their limitations; 3) the resulting bias in the papers we publish; 4) the impossibility of re-running some experiments due to their cost; 5) the dangers of unexplainable methods. If these issues are not addressed, we risk a loss of reproducibility, reputability, and subsequently public trust in our field. In this position paper, we outline each of these points and suggest ways forward.

[1]  K. Bretonnel Cohen,et al.  Community Perspective on Replicability in Natural Language Processing , 2019, RANLP.

[2]  Luciano Floridi,et al.  GPT-3: Its Nature, Scope, Limits, and Consequences , 2020, Minds and Machines.

[3]  Dirk Hovy,et al.  Cross-lingual Contextualized Topic Models with Zero-shot Learning , 2020, EACL.

[4]  Emily M. Bender,et al.  Towards Linguistically Generalizable NLP Systems: A Workshop and Shared Task , 2017, Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems.

[5]  Leif D. Nelson,et al.  Data from Paper “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant” , 2014 .

[6]  Davide Bacciu,et al.  A Fair Comparison of Graph Neural Networks for Graph Classification , 2020, ICLR.

[7]  K. Bretonnel Cohen,et al.  Reviewing Natural Language Processing Research , 2020, ACL.

[8]  Federico Bianchi,et al.  Language in a (Search) Box: Grounding Language Learning in Real-World Human-Machine Interaction , 2021, NAACL.

[9]  Emily M. Bender,et al.  On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 , 2021, FAccT.

[10]  J. Ioannidis Why Most Discovered True Associations Are Inflated , 2008, Epidemiology.

[11]  Antske Fokkens,et al.  Offspring from Reproduction Problems: What Replication Failure Teaches Us , 2013, ACL.

[12]  Ehud Reiter,et al.  A Systematic Review of Reproducibility Research in Natural Language Processing , 2021, EACL.

[13]  W. Tan,et al.  Deep entity matching with pre-trained language models , 2020, Proc. VLDB Endow..

[14]  Ser-Nam Lim,et al.  A Metric Learning Reality Check , 2020, ECCV.

[15]  Inioluwa Deborah Raji,et al.  Model Cards for Model Reporting , 2018, FAT.

[16]  Timothy Baldwin,et al.  Give Me Convenience and Give Her Death: Who Should Decide What Uses of NLP are Appropriate, and on What Basis? , 2020, ACL.

[17]  Dirk Hovy,et al.  HONEST: Measuring Hurtful Sentence Completion in Language Models , 2021, NAACL.

[18]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19]  Dirk Hovy,et al.  What the [MASK]? Making Sense of Language-Specific BERT Models , 2020, ArXiv.

[20]  Margot Mieskes,et al.  A Quantitative Study of Data in the NLP community , 2017, EthNLP@EACL.

[21]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[22]  Emily M. Bender,et al.  Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data , 2020, ACL.

[23]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[24]  Rainer Gemulla,et al.  You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings , 2020, ICLR.

[25]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[26]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[27]  Dirk Hovy,et al.  Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence , 2021, ACL/IJCNLP.

[28]  Antonio Candelieri,et al.  OCTIS: Comparing and Optimizing Topic models is Simple! , 2021, EACL.

[29]  Jacob Andreas,et al.  Experience Grounds Language , 2020, EMNLP.

[30]  Anna Rumshisky,et al.  A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.

[31]  Emily M. Bender,et al.  Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science , 2018, TACL.

[32]  C. Perez,et al.  Invisible Women: Exposing Data Bias in a World Designed for Men , 2020 .

[33]  Dietmar Jannach,et al.  Are we really making much progress? A worrying analysis of recent neural recommendation approaches , 2019, RecSys.

[34]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[35]  Allyson Ettinger,et al.  What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models , 2019, TACL.

[36]  Haoran Zhang,et al.  Hurtful words: quantifying biases in clinical contextual word embeddings , 2020, CHIL.

[37]  Minlie Huang,et al.  SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge , 2020, EMNLP.

[38]  Peter Norvig,et al.  On Chomsky and the Two Cultures of Statistical Learning , 2017 .

[39]  Fei Sha,et al.  DOCENT: Learning Self-Supervised Entity Representations from Large Document Collections , 2021, Conference of the European Chapter of the Association for Computational Linguistics.