Studying word order through iterative shuffling

As neural language models approach human performance on NLP benchmark tasks, their advances are widely seen as evidence of an increasingly complex understanding of syntax. This view rests upon a hypothesis that has not yet been empirically tested: that word order encodes meaning essential to performing these tasks. We refute this hypothesis in many cases: in the GLUE suite and in various genres of English text, the words in a sentence or phrase can rarely be permuted to form a phrase carrying substantially different information. Our surprising result relies on inference by iterative shuffling (IBIS), a novel, efficient procedure that finds the ordering of a bag of words having the highest likelihood under a fixed language model. IBIS can use any black-box model without additional training and is superior to existing word ordering algorithms. Coalescing our findings, we discuss how shuffling inference procedures such as IBIS can benefit language modeling and constrained generation.

[1]  Keld Helsgaun,et al.  An effective implementation of the Lin-Kernighan traveling salesman heuristic , 2000, Eur. J. Oper. Res..

[2]  Alexander M. Rush,et al.  Word Ordering Without Syntax , 2016, EMNLP.

[3]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[4]  Hal Daumé,et al.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[5]  Yue Zhang,et al.  Neural Transition-based Syntactic Linearization , 2018, INLG.

[6]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[7]  Brian W. Kernighan,et al.  An Effective Heuristic Algorithm for the Traveling-Salesman Problem , 1973, Oper. Res..

[8]  William J. Byrne,et al.  A Graph-Based Approach to String Regeneration , 2014, EACL.

[9]  Anna Rumshisky,et al.  A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.

[10]  Vivek Srikumar,et al.  BERT & Family Eat Word Salad: Experiments with Text Understanding , 2021, AAAI.

[11]  Joelle Pineau,et al.  UnNatural Language Inference , 2020, ACL.

[12]  Hung-Yu Kao,et al.  Probing Neural Network Comprehension of Natural Language Arguments , 2019, ACL.

[13]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[14]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[15]  Douwe Kiela,et al.  Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little , 2021, EMNLP.

[16]  Long Mai,et al.  Out of Order: How important is the sequential order of words in a sentence in Natural Language Understanding tasks? , 2020, FINDINGS.

[17]  Dipanjan Das,et al.  BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.

[18]  Yue Zhang,et al.  Transition-Based Syntactic Linearization , 2015, NAACL.

[19]  Roger Levy,et al.  Speakers optimize information density through syntactic reduction , 2006, NIPS.

[20]  Yue Zhang,et al.  An Empirical Comparison Between N-gram and Syntactic Language Models for Word Ordering , 2015, EMNLP.

[21]  Stephen Clark,et al.  Syntax-Based Grammaticality Improvement using CCG and Guided Search , 2011, EMNLP.

[22]  T. Florian Jaeger,et al.  Redundancy and reduction: Speakers manage syntactic information density , 2010, Cognitive Psychology.

[23]  Alexander A. Alemi,et al.  On the Use of ArXiv as a Dataset , 2019, ArXiv.

[24]  Stephen Wan,et al.  Improving Grammaticality in Statistical Sentence Generation: Introducing a Dependency Spanning Tree Algorithm with an Argument Satisfaction Model , 2009, EACL.

[25]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[26]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[27]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[28]  Marcus Tomalin,et al.  Word Ordering with Phrase-Based Grammars , 2014, EACL.

[29]  Richard Futrell,et al.  Universals of word order reflect optimization of grammars for efficient communication , 2020, Proceedings of the National Academy of Sciences.

[30]  Daniel Jurafsky,et al.  Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context , 2018, ACL.