Towards Interactive Language Modeling

Interaction between caregivers and children plays a critical role in human language acquisition and development. Given this observation, it is remarkable that explicit interaction plays little to no role in artificial language modeling—which also targets the acquisition of human language, yet by artificial models. Moreover, an interactive approach to language modeling has the potential to make language models substantially more versatile and to considerably impact downstream applications. Motivated by these considerations, we pioneer the space of interactive language modeling. As a first contribution we present a road map in which we detail the steps that need to be taken towards interactive language modeling. We then lead by example and take the first steps on this road map, showing the initial feasibility of our approach. As such, this work aims to be the start of a larger research agenda on interactive language modeling.

[1]  Yongdong Zhang,et al.  Curriculum Learning for Natural Language Understanding , 2020, ACL.

[2]  Jackie Chi Kit Cheung,et al.  BanditSum: Extractive Summarization as a Contextual Bandit , 2018, EMNLP.

[3]  Eve V. Clark,et al.  Conversation and Language Acquisition: A Pragmatic Approach , 2018 .

[4]  John Batali,et al.  Artificial Evolution of Syntactic Aptitude , 2019, Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society.

[5]  Elia Bruni,et al.  The Grammar of Emergent Languages , 2020, EMNLP.

[6]  SHAPELURN: An Interactive Language Learning Game with Logical Inference , 2021, INTERNLP.

[7]  Max Welling,et al.  Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement , 2019, ICML.

[8]  Dejiao Zhang,et al.  Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora , 2021, ArXiv.

[9]  Mirella Lapata,et al.  Ranking Sentences for Extractive Summarization with Reinforcement Learning , 2018, NAACL.

[10]  Phil Blunsom,et al.  Pitfalls of Static Language Modelling , 2021, ArXiv.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[13]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[14]  Janet Wiles,et al.  Learning to count without a counter: A case study of dynamics and activation landscapes in recurrent networks , 1995 .

[15]  Zheng Cao,et al.  Reducing BERT Computation by Padding Removal and Curriculum Learning , 2021, 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[16]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[17]  Elia Bruni,et al.  Co-evolution of language and agents in referential games , 2020, EACL.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Paul Rodríguez,et al.  Simple Recurrent Networks Learn Context-Free and Context-Sensitive Languages by Counting , 2001, Neural Computation.

[20]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[21]  Samuel R. Bowman,et al.  BLiMP: A Benchmark of Linguistic Minimal Pairs for English , 2019, SCIL.

[22]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[23]  Eyal Shnarch,et al.  Active Learning for BERT: An Empirical Study , 2020, EMNLP.

[24]  J. Bruner Child's Talk: Learning to Use Language , 1985 .

[25]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[26]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[27]  Udo Hahn,et al.  Multi-Task Active Learning for Linguistic Annotations , 2008, ACL.

[28]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[29]  Susan Goldin-Meadow,et al.  Language input and acquisition in a Mayan village: how important is directed speech? , 2012, Developmental science.

[30]  Paul Rodríguez,et al.  A Recurrent Neural Network that Learns to Count , 1999, Connect. Sci..

[31]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[32]  Mathijs Mul,et al.  Compositionality Decomposed: How do Neural Networks Generalise? , 2019, J. Artif. Intell. Res..

[33]  Modeling the Interaction Between Perception-Based and Production-Based Learning in Children’s Early Acquisition of Semantic Knowledge , 2021, CONLL.

[34]  Eugene Kharitonov,et al.  Can Transformers Jump Around Right in Natural Language? Assessing Performance Transfer from SCAN , 2021, BLACKBOXNLP.

[35]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[36]  John Schulman,et al.  Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Willem H. Zuidema,et al.  Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure , 2017, J. Artif. Intell. Res..

[38]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[39]  S. Gillis,et al.  Kindertaalverwerving : een handboek voor het Nederlands , 2000 .

[40]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[41]  Alejandrina Cristia,et al.  Segmentability Differences Between Child-Directed and Adult-Directed Speech: A Systematic Test With an Ecologically Valid Corpus , 2019, Open Mind.

[42]  Elia Bruni,et al.  Internal and External Pressures on Language Emergence: Least Effort, Object Constancy and Frequency , 2020, FINDINGS.

[43]  Pushmeet Kohli,et al.  Analysing Mathematical Reasoning Abilities of Neural Models , 2019, ICLR.

[44]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.