Squib: Reproducibility in Computational Linguistics: Are We Willing to Share?

This study focuses on an essential precondition for reproducibility in computational linguistics: the willingness of authors to share relevant source code and data. Ten years after Ted Pedersen’s influential “Last Words” contribution in Computational Linguistics, we investigate to what extent researchers in computational linguistics are willing and able to share their data and code. We surveyed all 395 full papers presented at the 2011 and 2016 ACL Annual Meetings, and identified whether links to data and code were provided. If working links were not provided, authors were requested to provide this information. Although data were often available, code was shared less often. When working links to code or data were not provided in the paper, authors provided the code in about one third of cases. For a selection of ten papers, we attempted to reproduce the results using the provided data and code. We were able to reproduce the results approximately for six papers. For only a single paper did we obtain the exact same results. Our findings show that even though the situation appears to have improved comparing 2016 to 2011, empiricism in computational linguistics still largely remains a matter of faith. Nevertheless, we are somewhat optimistic about the future. Ensuring reproducibility is not only important for the field as a whole, but also seems worthwhile for individual researchers: The median citation count for studies with working links to the source code is higher.

[1]  Naoaki Okazaki,et al.  Learning Semantically and Additively Compositional Distributional Representations , 2016, ACL.

[2]  C. Drummond Replicability is not Reproducibility:Nor is it Good Science , 2009 .

[3]  Dan Klein,et al.  Learning Dependency-Based Compositional Semantics , 2011, CL.

[4]  Hwee Tou Ng,et al.  Translating from Morphologically Complex Languages: A Paraphrase-Based Approach , 2011, ACL.

[5]  Ted Pedersen,et al.  Empiricism Is Not a Matter of Faith , 2008, Computational Linguistics.

[6]  Maximin Coavoux,et al.  Neural Greedy Constituent Parsing with Dynamic Oracles , 2016, ACL.

[7]  Regina Barzilay,et al.  Learning to Win by Reading Manuals in a Monte-Carlo Framework , 2011, ACL.

[8]  Harith Alani,et al.  Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification , 2011, ACL.

[9]  Eric P. Xing,et al.  Harnessing Deep Neural Networks with Logic Rules , 2016, ACL.

[10]  Margot Mieskes,et al.  A Quantitative Study of Data in the NLP community , 2017, EthNLP@EACL.

[11]  Regina Barzilay,et al.  Content Models with Attitude , 2011, ACL.

[12]  Daniel M. Bikel,et al.  Intricacies of Collins’ Parsing Model , 2004, CL.

[13]  Piek T. J. M. Vossen,et al.  Replicability and reproducibility of research results for human language technology: introducing an LRE special section , 2017, Lang. Resour. Evaluation.

[14]  Shaohua Yang,et al.  Physical Causality of Action Verbs in Grounded Language Understanding , 2016, ACL.

[15]  Lorena A. Barba,et al.  Terminologies for Reproducible Research , 2018, ArXiv.

[16]  Antske Fokkens,et al.  Offspring from Reproduction Problems: What Replication Failure Teaches Us , 2013, ACL.

[17]  Grzegorz Kondrak,et al.  Leveraging Inflection Tables for Stemming and Lemmatization , 2016, ACL.

[18]  Alex M. Warren Repeatability and Benefaction in Computer Systems Research — A Study and a Modest Proposal , 2015 .

[19]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.