Controlled Evaluation of Grammatical Knowledge in Mandarin Chinese Language Models

Prior work has shown that structural supervision helps English language models learn generalizations about syntactic phenomena such as subject-verb agreement. However, it remains unclear if such an inductive bias would also improve language models’ ability to learn grammatical dependencies in typologically different languages. Here we investigate this question in Mandarin Chinese, which has a logographic, largely syllable-based writing system; different word order; and sparser morphology than English. We train LSTMs, Recurrent Neural Network Grammars, Transformer language models, and Transformerparameterized generative parsing models on two Mandarin Chinese datasets of different sizes. We evaluate the models’ ability to learn different aspects of Mandarin grammar that assess syntactic and semantic relationships. We find suggestive evidence that structural supervision helps with representing syntactic state across intervening content and improves performance in low-data settings, suggesting that the benefits of hierarchical inductive biases in acquiring dependency relationships may extend beyond English.

[1]  E. Kaiser,et al.  Effects of Early Cues on the Processing of Chinese Relative Clauses: Evidence for Experience-Based Theories. , 2018, Cognitive science.

[2]  Peng Qian,et al.  Representation of Constituents in Neural Language Models: Coordination Phrase as a Case Study , 2019, EMNLP.

[3]  Francis M. Tyers,et al.  Can LSTM Learn to Capture Agreement? The Case of Basque , 2018, BlackboxNLP@EMNLP.

[4]  Roger P. Levy,et al.  A Systematic Assessment of Syntactic Generalization in Neural Language Models , 2020, ACL.

[5]  Roger Levy,et al.  Structural Supervision Improves Learning of Non-Local Grammatical Dependencies , 2019, NAACL.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[8]  Dan Klein,et al.  Effective Inference for Generative Neural Parsing , 2017, EMNLP.

[9]  Tal Linzen,et al.  Targeted Syntactic Evaluation of Language Models , 2018, EMNLP.

[10]  Meilin Zhan,et al.  Comparing Theories of Speaker Choice Using a Model of Classifier Production in Mandarin Chinese , 2018, NAACL.

[11]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[12]  Yingyi Luo,et al.  Building Chinese relative clause structures with lexical and syntactic cues: evidence from visual world eye-tracking and reading times , 2014 .

[13]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[14]  Samuel R. Bowman,et al.  BLiMP: A Benchmark of Linguistic Minimal Pairs for English , 2019, SCIL.

[15]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  Dan Klein,et al.  Constituency Parsing with a Self-Attentive Encoder , 2018, ACL.

[18]  Changbing Yang,et al.  CLiMP: A Benchmark for Chinese Language Model Evaluation , 2021, EACL.

[19]  Roger Levy,et al.  Tregex and Tsurgeon: tools for querying and manipulating tree data structures , 2006, LREC.

[20]  John Hale,et al.  A Probabilistic Earley Parser as a Psycholinguistic Model , 2001, NAACL.

[21]  Dan Klein,et al.  Multilingual Constituency Parsing with Self-Attention and Pre-Training , 2018, ACL.

[22]  Ramón Fernández Astudillo,et al.  Structural Guidance for Transformer Language Models , 2021, ACL.

[23]  武田 一哉,et al.  Recurrent Neural Networkに基づく日常生活行動認識 , 2016 .

[24]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[25]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[26]  Haitao Liu,et al.  The effects of sentence length on dependency distance, dependency direction and the implications–Based on a parallel English–Chinese dependency treebank , 2015 .

[27]  Jörg Tiedemann,et al.  Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF , 2017, IJCNLP.

[28]  Tal Linzen,et al.  Cross-Linguistic Syntactic Evaluation of Word Prediction Models , 2020, ACL.

[29]  S. Balari Two Types of Agreement , 1992 .

[30]  John Hale,et al.  LSTMs Can Learn Syntax-Sensitive Dependencies Well, But Modeling Structure Makes Them Better , 2018, ACL.