A Survey of Large Language Models

Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks. Since researchers have found that model scaling can lead to performance improvement, they further study the scaling effect by increasing the model size to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these enlarged language models not only achieve a significant performance improvement but also show some special abilities that are not present in small-scale language models. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size. Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the launch of ChatGPT, which has attracted widespread attention from society. The technical evolution of LLMs has been making an important impact on the entire AI community, which would revolutionize the way how we develop and use AI algorithms. In this survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular, we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Besides, we also summarize the available resources for developing LLMs and discuss the remaining issues for future directions.

[1]  Oskar van der Wal,et al.  Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling , 2023, ArXiv.

[2]  Le Sun,et al.  ChatGPT Is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models , 2023, LREC.

[3]  Meysam Alizadeh,et al.  ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks , 2023, ArXiv.

[4]  Michael Färber,et al.  unarXive 2022: All arXiv Publications Pre-Processed for NLP, Including Structured Full-Text and Citation Network , 2023, ArXiv.

[5]  You Zhang,et al.  ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge , 2023, ArXiv.

[6]  R. Mak,et al.  The utility of ChatGPT for cancer treatment information , 2023, medRxiv.

[7]  Marco Tulio Ribeiro,et al.  Sparks of Artificial General Intelligence: Early experiments with GPT-4 , 2023, ArXiv.

[8]  Haewoon Kwak,et al.  Can we trust the evaluation on ChatGPT? , 2023, TRUSTNLP.

[9]  E. Horvitz,et al.  Capabilities of GPT-4 on Medical Challenge Problems , 2023, ArXiv.

[10]  S. S. Gill,et al.  Mind meets machine: Unravelling GPT-4's cognitive psychology , 2023, BenchCouncil Transactions on Benchmarks, Standards and Evaluations.

[11]  Anton Firc,et al.  On the Educational Impact of ChatGPT: Is Artificial Intelligence Ready to Obtain a University Degree? , 2023, ITiCSE.

[12]  Wenpeng Yin,et al.  Is Prompt All You Need? No. A Comprehensive and Broader View of Instruction Learning , 2023, ArXiv.

[13]  Xuanjing Huang,et al.  A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models , 2023, ArXiv.

[14]  Zhongxiang Sun,et al.  A Short Survey of Viewing Large Language Models in Legal Aspect , 2023, ArXiv.

[15]  Henrique Pondé de Oliveira Pinto,et al.  GPT-4 Technical Report , 2023, 2303.08774.

[16]  Aixin Sun,et al.  Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples! , 2023, ArXiv.

[17]  Yuxiong He,et al.  A Comprehensive Study on Post-Training Quantization for Large Language Models , 2023, ArXiv.

[18]  Thomas Lukasiewicz,et al.  Consistency Analysis of ChatGPT , 2023, ArXiv.

[19]  J. Tenenbaum,et al.  Planning with Large Language Models for Code Generation , 2023, ArXiv.

[20]  U. V. Luxburg,et al.  ChatGPT Participates in a Computer Science Exam , 2023, ArXiv.

[21]  Chenfei Wu,et al.  Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models , 2023, ArXiv.

[22]  Xiaoqian Jiang,et al.  Does Synthetic Data Generation of LLMs Help Clinical Text Mining? , 2023, ArXiv.

[23]  Philip S. Yu,et al.  A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT , 2023, ArXiv.

[24]  Mehdi S. M. Sajjadi,et al.  PaLM-E: An Embodied Multimodal Language Model , 2023, ICML.

[25]  Cuiyun Gao,et al.  On the Feasibility of Specialized Ability Extracting for Large Language Code Models , 2023, ArXiv.

[26]  Shima Imani,et al.  MathPrompter: Mathematical Reasoning using Large Language Models , 2023, ACL.

[27]  Björn Schuller,et al.  Will Affective Computing Emerge from Foundation Models and General AI? A First Evaluation on ChatGPT , 2023, ArXiv.

[28]  A. Ananthaswamy In AI, is bigger always better? , 2023, Nature.

[29]  Xuanjing Huang,et al.  How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks , 2023, ArXiv.

[30]  Minlie Huang,et al.  ChatGPT: potential, prospects, and limitations , 2023, Frontiers Inf. Technol. Electron. Eng..

[31]  Naman Goyal,et al.  LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.

[32]  Li Dong,et al.  Language Is Not All You Need: Aligning Perception with Language Models , 2023, NeurIPS.

[33]  Xipeng Qiu,et al.  Finding Supporting Examples for In-Context Learning , 2023, ArXiv.

[34]  Michel Galley,et al.  Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback , 2023, ArXiv.

[35]  Dragomir R. Radev,et al.  ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics , 2023, ArXiv.

[36]  Przemyslaw Kazienko,et al.  ChatGPT: Jack of all trades, master of none , 2023, SSRN Electronic Journal.

[37]  Juhua Liu,et al.  Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT , 2023, ArXiv.

[38]  Nanyang Technological University,et al.  A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT , 2023, ArXiv.

[39]  Benjamin Van Durme,et al.  Can GPT-3 Perform Statutory Reasoning? , 2023, ArXiv.

[40]  Dilek Z. Hakkani-Tür,et al.  Selective In-Context Data Augmentation for Intent Detection using Pointwise V-Information , 2023, EACL.

[41]  Luke Zettlemoyer,et al.  Toolformer: Language Models Can Teach Themselves to Use Tools , 2023, NeurIPS.

[42]  I. Jerković,et al.  ChatGPT-3.5 as writing assistance in students’ essays , 2023, Humanities and Social Sciences Communications.

[43]  Michihiro Yasunaga,et al.  Is ChatGPT a General-Purpose Natural Language Processing Task Solver? , 2023, EMNLP.

[44]  Dan Su,et al.  A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity , 2023, IJCNLP.

[45]  A. Borji A Categorical Archive of ChatGPT Failures , 2023, ArXiv.

[46]  Olivier Sigaud,et al.  Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning , 2023, ArXiv.

[47]  M. Kosinski Theory of Mind Might Have Spontaneously Emerged in Large Language Models , 2023, 2302.02083.

[48]  Alexander J. Smola,et al.  Multimodal Chain-of-Thought Reasoning in Language Models , 2023, ArXiv.

[49]  Minlie Huang,et al.  Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models , 2023, ICML.

[50]  Quoc V. Le,et al.  The Flan Collection: Designing Data and Methods for Effective Instruction Tuning , 2023, ICML.

[51]  Chris Callison-Burch,et al.  Faithful Chain-of-Thought Reasoning , 2023, ArXiv.

[52]  Ashish Sabharwal,et al.  Specializing Smaller Language Models towards Multi-Step Reasoning , 2023, ICML.

[53]  G. Kortemeyer Could an artificial-intelligence agent pass an introductory physics course? , 2023, Physical Review Physics Education Research.

[54]  O. Nov,et al.  Putting ChatGPT's Medical Advice to the (Turing) Test , 2023, medRxiv.

[55]  Ziyuan Wang,et al.  How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection , 2023, ArXiv.

[56]  Mathew Welsh The End of Programming , 2022, Commun. ACM.

[57]  Hiroaki Hayashi,et al.  Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing , 2021, ACM Comput. Surv..

[58]  Yi Tay,et al.  Efficient Transformers: A Survey , 2020, ACM Comput. Surv..

[59]  A. V. Podolskiy,et al.  PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing , 2023, ArXiv.

[60]  Kristin E. Hickman,et al.  ChatGPT Goes to Law School , 2023, SSRN Electronic Journal.

[61]  Dimitris Papailiopoulos,et al.  Transformers as Algorithms: Generalization and Implicit Model Selection in In-context Learning , 2023, ArXiv.

[62]  David Ifeoluwa Adelani,et al.  The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset , 2023, NeurIPS.

[63]  M. Ingrisch,et al.  ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports , 2022, European radiology.

[64]  Xi Victoria Lin,et al.  OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization , 2022, ArXiv.

[65]  Kang Min Yoo,et al.  Prompt-Augmented Linear Probing: Scaling Beyond The Limit of Few-shot In-Context Learners , 2022, ArXiv.

[66]  Noah A. Smith,et al.  Self-Instruct: Aligning Language Model with Self Generated Instructions , 2022, ArXiv.

[67]  Kai-Wei Chang,et al.  A Survey of Deep Learning for Mathematical Reasoning , 2022, ArXiv.

[68]  Daniel Fried,et al.  Execution-Based Evaluation for Open-Domain Code Generation , 2022, ArXiv.

[69]  Li Dong,et al.  Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers , 2023, Annual Meeting of the Association for Computational Linguistics.

[70]  K. Chang,et al.  Towards Reasoning in Large Language Models: A Survey , 2022, ArXiv.

[71]  Se-Young Yun,et al.  Large Language Models Are Reasoning Teachers , 2022, ArXiv.

[72]  Omer Levy,et al.  Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor , 2022, ACL.

[73]  Shizhu He,et al.  Large Language Models are reasoners with Self-Verification , 2022, ArXiv.

[74]  Fei Huang,et al.  Reasoning with Language Model Prompting: A Survey , 2022, ArXiv.

[75]  Teo Susnjak ChatGPT: The End of Online Exam Integrity? , 2022, ArXiv.

[76]  D. Roth,et al.  Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale , 2022, ArXiv.

[77]  Lucie Charlotte Magister,et al.  Teaching Small Language Models to Reason , 2022, ArXiv.

[78]  A. Zhmoginov,et al.  Transformers learn in-context by gradient descent , 2022, ArXiv.

[79]  Jonathan Berant,et al.  Diverse Demonstrations Improve In-context Compositional Generalization , 2022, ArXiv.

[80]  M. Shanahan Talking About Large Language Models , 2022, Commun. ACM.

[81]  Frank Schilder,et al.  Legal Prompt Engineering for Multilingual Legal Judgement Prediction , 2022, ArXiv.

[82]  Frank Schilder,et al.  Legal Prompting: Teaching a Language Model to Think Like a Lawyer , 2022, ArXiv.

[83]  Mrinmaya Sachan,et al.  Distilling Multi-Step Reasoning Capabilities of Large Language Models into Smaller Models via Semantic Decompositions , 2022, arXiv.org.

[84]  D. Schuurmans,et al.  What learning algorithm is in-context learning? Investigations with linear models , 2022, ArXiv.

[85]  Wayne Xin Zhao,et al.  Dense Text Retrieval Based on Pretrained Language Models: A Survey , 2022, ACM Trans. Inf. Syst..

[86]  Greg Durrett,et al.  Complementary Explanations for Effective In-Context Learning , 2022, Annual Meeting of the Association for Computational Linguistics.

[87]  Jamie Callan,et al.  PAL: Program-aided Language Models , 2022, ICML.

[88]  Luke Zettlemoyer,et al.  DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation , 2022, ICML.

[89]  Guillem Cucurull,et al.  Galactica: A Large Language Model for Science , 2022, ArXiv.

[90]  Christopher D. Manning,et al.  Holistic Evaluation of Language Models , 2023, Annals of the New York Academy of Sciences.

[91]  Alexander M. Rush,et al.  BLOOM: A 176B-Parameter Open-Access Multilingual Language Model , 2022, ArXiv.

[92]  Yiming Zhang,et al.  Active Example Selection for In-Context Learning , 2022, EMNLP.

[93]  Dragomir R. Radev,et al.  Crosslingual Generalization through Multitask Finetuning , 2022, ArXiv.

[94]  Zheng Xin Yong,et al.  What Language Model to Train if You Have One Million GPU Hours? , 2022, EMNLP.

[95]  Yuhuai Wu,et al.  Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs , 2022, ICLR.

[96]  Andrew M. Dai,et al.  Scaling Instruction-Finetuned Language Models , 2022, ArXiv.

[97]  Quoc V. Le,et al.  Transcending Scaling Laws with 0.1% Extra Compute , 2022, EMNLP.

[98]  S. Gu,et al.  Large Language Models Can Self-Improve , 2022, EMNLP.

[99]  Minlie Huang,et al.  Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task Generalization , 2022, EMNLP.

[100]  Quoc V. Le,et al.  Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them , 2022, ACL.

[101]  Graham Neubig,et al.  Language Models of Code are Few-Shot Commonsense Learners , 2022, EMNLP.

[102]  D. Klein,et al.  Re3: Generating Longer Stories With Recursive Reprompting and Revision , 2022, EMNLP.

[103]  Alexander J. Smola,et al.  Automatic Chain of Thought Prompting in Large Language Models , 2022, ICLR.

[104]  Noah A. Smith,et al.  Measuring and Narrowing the Compositionality Gap in Language Models , 2022, ArXiv.

[105]  Hyung Won Chung,et al.  Language Models are Multilingual Chain-of-Thought Reasoners , 2022, ICLR.

[106]  P. Zhang,et al.  GLM-130B: An Open Bilingual Pre-trained Model , 2022, ICLR.

[107]  He He,et al.  Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought , 2022, ICLR.

[108]  Ashish Sabharwal,et al.  Complexity-Based Prompting for Multi-Step Reasoning , 2022, ICLR.

[109]  Lisa Anne Hendricks,et al.  Improving alignment of dialogue agents via targeted human judgements , 2022, ArXiv.

[110]  Tom B. Brown,et al.  In-context Learning and Induction Heads , 2022, ArXiv.

[111]  Keith B. Hall,et al.  Promptagator: Few-shot Dense Retrieval From 8 Examples , 2022, ICLR.

[112]  D. Fox,et al.  ProgPrompt: Generating Situated Robot Task Plans using Large Language Models , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[113]  Hui Su,et al.  WeLM: A Well-Read Pre-trained Language Model for Chinese , 2022, ArXiv.

[114]  Aman Madaan,et al.  Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango , 2022, ArXiv.

[115]  Peter R. Florence,et al.  Code as Policies: Language Model Programs for Embodied Control , 2022, ArXiv.

[116]  John J. Nay Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans , 2022, SSRN Electronic Journal.

[117]  Noah A. Smith,et al.  Selective Annotation Makes Language Models Better Few-Shot Learners , 2022, ICLR.

[118]  Tom B. Brown,et al.  Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned , 2022, ArXiv.

[119]  M. Lewis,et al.  LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale , 2022, ArXiv.

[120]  Xifeng Yan,et al.  Limitations of Language Models in Arithmetic and Symbolic Induction , 2022, ACL.

[121]  Jane A. Yu,et al.  Few-shot Learning with Retrieval Augmented Language Models , 2022, J. Mach. Learn. Res..

[122]  Jack G. M. FitzGerald,et al.  AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model , 2022, ArXiv.

[123]  Percy Liang,et al.  What Can Transformers Learn In-Context? A Case Study of Simple Function Classes , 2022, NeurIPS.

[124]  P. Bhattacharyya,et al.  ScienceQA: a novel resource for question answering on scholarly articles , 2022, International Journal on Digital Libraries.

[125]  O. Winther,et al.  Can large language models reason about medical questions? , 2022, Patterns.

[126]  Yuhuai Wu,et al.  Exploring Length Generalization in Large Language Models , 2022, NeurIPS.

[127]  Shannon L. Spruit,et al.  No Language Left Behind: Scaling Human-Centered Machine Translation , 2022, ArXiv.

[128]  D. Schuurmans,et al.  Rationale-Augmented Ensembles in Language Models , 2022, ArXiv.

[129]  Yuhuai Wu,et al.  Solving Quantitative Reasoning Problems with Language Models , 2022, NeurIPS.

[130]  Kang Min Yoo,et al.  Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator , 2022, ArXiv.

[131]  J. Dean,et al.  Emergent Abilities of Large Language Models , 2022, Trans. Mach. Learn. Res..

[132]  Wayne Xin Zhao,et al.  JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding , 2022, KDD.

[133]  Gerard de Melo,et al.  Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , 2022, ArXiv.

[134]  Weizhu Chen,et al.  On the Advance of Making Language Models Better Reasoners , 2022, ArXiv.

[135]  Daniel Y. Fu,et al.  FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness , 2022, NeurIPS.

[136]  Markus N. Rabe,et al.  Autoformalization with Large Language Models , 2022, NeurIPS.

[137]  S. Gu,et al.  Large Language Models are Zero-Shot Reasoners , 2022, NeurIPS.

[138]  Yao Zhao,et al.  TALM: Tool Augmented Language Models , 2022, ArXiv.

[139]  Yuhuai Wu,et al.  Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers , 2022, NeurIPS.

[140]  Tom B. Brown,et al.  Scaling Laws and Interpretability of Learning from Repeated Data , 2022, ArXiv.

[141]  D. Schuurmans,et al.  Least-to-Most Prompting Enables Complex Reasoning in Large Language Models , 2022, ICLR.

[142]  Hyung Won Chung,et al.  UL2: Unifying Language Learning Paradigms , 2022, ICLR.

[143]  Lawrence C. McAfee,et al.  Reducing Activation Recomputation in Large Transformer Models , 2022, MLSys.

[144]  Xi Victoria Lin,et al.  OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.

[145]  Kyunghyun Cho,et al.  On the Effect of Pretraining Corpora on In-context Learning by a Large-scale Language Model , 2022, NAACL.

[146]  Andrew Kyle Lampinen,et al.  Data Distributional Properties Drive Emergent In-Context Learning in Transformers , 2022, NeurIPS.

[147]  Noah A. Smith,et al.  Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks , 2022, EMNLP.

[148]  Stella Rose Biderman,et al.  GPT-NeoX-20B: An Open-Source Autoregressive Language Model , 2022, BIGSCIENCE.

[149]  Hyung Won Chung,et al.  What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? , 2022, ICML.

[150]  Tom B. Brown,et al.  Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback , 2022, ArXiv.

[151]  Sida I. Wang,et al.  InCoder: A Generative Model for Code Infilling and Synthesis , 2022, ICLR.

[152]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[153]  S. Levine,et al.  Do As I Can, Not As I Say: Grounding Language in Robotic Affordances , 2022, CoRL.

[154]  Lisa Anne Hendricks,et al.  Training Compute-Optimal Large Language Models , 2022, ArXiv.

[155]  S. Savarese,et al.  CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis , 2022, ICLR.

[156]  Jacob Menick,et al.  Teaching language models to support answers with verified quotes , 2022, ArXiv.

[157]  D. Schuurmans,et al.  Self-Consistency Improves Chain of Thought Reasoning in Language Models , 2022, ICLR.

[158]  Qun Liu,et al.  Compression of Generative Pre-trained Language Models via Quantization , 2022, ACL.

[159]  Huan Sun,et al.  Iteratively Prompt Pre-trained Language Models for Chain of Thought , 2022, EMNLP.

[160]  Angeliki Lazaridou,et al.  Internet-augmented language models through few-shot prompting for open-domain question answering , 2022, ArXiv.

[161]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[162]  Li Dong,et al.  DeepNet: Scaling Transformers to 1, 000 Layers , 2022, ArXiv.

[163]  Frank F. Xu,et al.  A systematic evaluation of large language models of code , 2022, MAPS@PLDI.

[164]  M. Lewis,et al.  Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? , 2022, Conference on Empirical Methods in Natural Language Processing.

[165]  Florian Tramèr,et al.  Quantifying Memorization Across Neural Language Models , 2022, ICLR.

[166]  Colin Raffel,et al.  Deduplicating Training Data Mitigates Privacy Risks in Language Models , 2022, ICML.

[167]  David Bau,et al.  Locating and Editing Factual Associations in GPT , 2022, NeurIPS.

[168]  Cherepanov,et al.  Competition-level code generation with AlphaCode , 2022, Science.

[169]  Geoffrey Irving,et al.  Red Teaming Language Models with Language Models , 2022, EMNLP.

[170]  Jesse Michael Han,et al.  Formal Mathematics Statement Curriculum Learning , 2022, ICLR.

[171]  Alexander M. Rush,et al.  PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts , 2022, ACL.

[172]  Blake A. Hechtman,et al.  Unified Scaling Laws for Routed Language Models , 2022, ICML.

[173]  Orhan Firat,et al.  Examining Scaling and Transfer of Language Model Architectures for Machine Translation , 2022, ICML.

[174]  Reza Yazdani Aminabadi,et al.  Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model , 2022, ArXiv.

[175]  Dale Schuurmans,et al.  Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.

[176]  Joseph Gonzalez,et al.  Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning , 2022, OSDI.

[177]  Weizhu Chen,et al.  Reasoning Like Program Executors , 2022, EMNLP.

[178]  Renelito Delos Santos,et al.  LaMDA: Language Models for Dialog Applications , 2022, ArXiv.

[179]  P. Abbeel,et al.  Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents , 2022, ICML.

[180]  Niket Tandon,et al.  Memory-assisted prompt editing to improve GPT-3 after deployment , 2022, EMNLP.

[181]  Dragomir R. Radev,et al.  UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models , 2022, EMNLP.

[182]  Jonathan Berant,et al.  Learning To Retrieve Prompts for In-Context Learning , 2021, NAACL.

[183]  Quoc V. Le,et al.  GLaM: Efficient Scaling of Language Models with Mixture-of-Experts , 2021, ICML.

[184]  Diego de Las Casas,et al.  Improving language models by retrieving from trillions of tokens , 2021, ICML.

[185]  Sanket Vaibhav Mehta,et al.  ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning , 2021, ArXiv.

[186]  Sang Michael Xie,et al.  An Explanation of In-context Learning as Implicit Bayesian Inference , 2021, ICLR.

[187]  M. Lewis,et al.  MetaICL: Learning to Learn In Context , 2021, NAACL.

[188]  Alexander M. Rush,et al.  Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.

[189]  Owain Evans,et al.  TruthfulQA: Measuring How Models Mimic Human Falsehoods , 2021, ACL.

[190]  Quoc V. Le,et al.  Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.

[191]  Jesse Michael Han,et al.  MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics , 2021, ICLR.

[192]  Noah A. Smith,et al.  Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation , 2021, ICLR.

[193]  Wayne Xin Zhao,et al.  Complex Knowledge Base Question Answering: A Survey , 2021, IEEE Transactions on Knowledge and Data Engineering.

[194]  Nicholas Carlini,et al.  Deduplicating Training Data Makes Language Models Better , 2021, ACL.

[195]  Marc'Aurelio Ranzato,et al.  The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation , 2021, TACL.

[196]  Yang You,et al.  Tesseract: Parallelize the Tensor Parallelism Efficiently , 2021, ICPP.

[197]  S. Riedel,et al.  Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity , 2021, ACL.

[198]  Hannaneh Hajishirzi,et al.  Cross-Task Generalization via Natural Language Crowdsourcing Instructions , 2021, ACL.

[199]  Li Dong,et al.  Knowledge Neurons in Pretrained Transformers , 2021, ACL.

[200]  Weizhu Chen,et al.  What Makes Good In-Context Examples for GPT-3? , 2021, DEELIO.

[201]  Liangming Pan,et al.  KQA Pro: A Dataset with Explicit Compositional Programs for Complex Question Answering over Knowledge Base , 2020, ACL.

[202]  Philipp Koehn,et al.  Findings of the 2022 Conference on Machine Translation (WMT22) , 2022, WMT.

[203]  Yuzhong Qu,et al.  Logical Form Generation via Multi-task Learning for Complex Question Answering over Knowledge Bases , 2022, COLING.

[204]  Chae-Gyun Lim,et al.  Does GPT-3 Generate Empathetic Dialogues? A Novel In-Context Example Selection Method and Automatic Evaluation Metric for Empathetic Dialogue Generation , 2022, COLING.

[205]  Junyuan Shang,et al.  ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation , 2021, ArXiv.

[206]  Jeff Wu,et al.  WebGPT: Browser-assisted question-answering with human feedback , 2021, ArXiv.

[207]  Po-Sen Huang,et al.  Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.

[208]  Dario Amodei,et al.  A General Language Assistant as a Laboratory for Alignment , 2021, ArXiv.

[209]  David Bieber,et al.  Show Your Work: Scratchpads for Intermediate Computation with Language Models , 2021, ArXiv.

[210]  Yang You,et al.  Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training , 2021, ArXiv.

[211]  Xiaodong Yi,et al.  OneFlow: Redesign the Distributed Deep Learning Framework from Scratch , 2021, ArXiv.

[212]  Mohammad Bavarian,et al.  Training Verifiers to Solve Math Word Problems , 2021, ArXiv.

[213]  Liang Xu,et al.  Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning , 2021, ArXiv.

[214]  Jan Leike,et al.  Recursively Summarizing Books with Human Feedback , 2021, ArXiv.

[215]  Kyungduk Kim,et al.  What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers , 2021, EMNLP.

[216]  Charles Sutton,et al.  Program Synthesis with Large Language Models , 2021, ArXiv.

[217]  Silvio Savarese,et al.  BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments , 2021, CoRL.

[218]  Wojciech Zaremba,et al.  Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[219]  Hao Tian,et al.  ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation , 2021, ArXiv.

[220]  Zhiyuan Liu,et al.  CPM-2: Large-scale Cost-effective Pre-trained Language Models , 2021, AI Open.

[221]  Yang You,et al.  Maximizing Parallelism in Distributed Training for Huge Neural Networks , 2021, ArXiv.

[222]  Yang You,et al.  Sequence Parallelism: Long Sequence Training from System Perspective , 2021, ACL.

[223]  Chang Zhou,et al.  CogView: Mastering Text-to-Image Generation via Transformers , 2021, NeurIPS.

[224]  Ji-Rong Wen,et al.  Pretrained Language Models for Text Generation: A Survey , 2021, ArXiv.

[225]  Dawn Song,et al.  Measuring Coding Challenge Competence With APPS , 2021, NeurIPS Datasets and Benchmarks.

[226]  Kaisheng M. Wang,et al.  PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation , 2021, ArXiv.

[227]  Jianlin Su,et al.  RoFormer: Enhanced Transformer with Rotary Position Embedding , 2021, Neurocomputing.

[228]  Xiang Ren,et al.  CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP , 2021, EMNLP.

[229]  Oyvind Tafjord,et al.  Explaining Answers with Entailment Trees , 2021, EMNLP.

[230]  Prateek Yadav,et al.  ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning , 2021, EMNLP.

[231]  Chaoyu Gong,et al.  An Efficient 2D Method for Training Super-Large Deep Learning Models , 2021, ArXiv.

[232]  Amar Phanishayee,et al.  Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM , 2021, SC21: International Conference for High Performance Computing, Networking, Storage and Analysis.

[233]  Tom Everitt,et al.  Alignment of Language Agents , 2021, ArXiv.

[234]  Yejin Choi,et al.  NaturalProofs: Mathematical Theorem Proving in Natural Language , 2021, NeurIPS Datasets and Benchmarks.

[235]  Zhilin Yang,et al.  FastMoE: A Fast Mixture-of-Expert Training System , 2021, ArXiv.

[236]  Stella Biderman,et al.  GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow , 2021 .

[237]  Navin Goyal,et al.  Are NLP Models really able to Solve Simple Math Word Problems? , 2021, NAACL.

[238]  Roy Schwartz,et al.  Random Feature Attention , 2021, ICLR.

[239]  Hyung Won Chung,et al.  Do Transformer Modifications Transfer Across Implementations and Applications? , 2021, EMNLP.

[240]  D. Klein,et al.  Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.

[241]  Miles Brundage,et al.  Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models , 2021, ArXiv.

[242]  Sonal Gupta,et al.  Muppet: Massive Multi-task Representations with Pre-Finetuning , 2021, EMNLP.

[243]  Noam M. Shazeer,et al.  Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, J. Mach. Learn. Res..

[244]  Jonathan Berant,et al.  Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies , 2021, Transactions of the Association for Computational Linguistics.

[245]  Charles Foster,et al.  The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.

[246]  Peter Clark,et al.  ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language , 2020, FINDINGS.

[247]  Colin Raffel,et al.  Extracting Training Data from Large Language Models , 2020, USENIX Security Symposium.

[248]  Brian M. Sadler,et al.  Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases , 2020, WWW.

[249]  Colin Raffel,et al.  mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2020, NAACL.

[250]  Dawn Song,et al.  Measuring Massive Multitask Language Understanding , 2020, ICLR.

[251]  Joachim Daiber,et al.  MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering , 2020, Transactions of the Association for Computational Linguistics.

[252]  Orhan Firat,et al.  GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.

[253]  Mary Williamson,et al.  Recipes for Building an Open-Domain Chatbot , 2020, EACL.

[254]  Markus Freitag,et al.  Findings of the 2021 Conference on Machine Translation (WMT21) , 2021, WMT.

[255]  Yejin Choi,et al.  proScript: Partially Ordered Scripts Generation , 2021, EMNLP.

[256]  Yang You,et al.  PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management , 2021, ArXiv.

[257]  Claire Cardie,et al.  WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization , 2020, FINDINGS.

[258]  Samuel R. Bowman,et al.  CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models , 2020, EMNLP.

[259]  Ilya Sutskever,et al.  Generative Language Modeling for Automated Theorem Proving , 2020, ArXiv.

[260]  Ryan J. Lowe,et al.  Learning to summarize from human feedback , 2020, NeurIPS 2020.

[261]  Olatunji Ruwase,et al.  DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters , 2020, KDD.

[262]  M. Zaheer,et al.  Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.

[263]  Ming-Wei Chang,et al.  Retrieval Augmented Language Model Pre-Training , 2020, ICML.

[264]  Keh-Yih Su,et al.  A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers , 2020, ACL.

[265]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[266]  Fabio Petroni,et al.  Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.

[267]  Ryan McDonald,et al.  On Faithfulness and Factuality in Abstractive Summarization , 2020, ACL.

[268]  Dian Yu,et al.  CLUE: A Chinese Language Understanding Evaluation Benchmark , 2020, COLING.

[269]  Xipeng Qiu,et al.  Pre-trained models for natural language processing: A survey , 2020, Science China Technological Sciences.

[270]  Eunsol Choi,et al.  TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages , 2020, Transactions of the Association for Computational Linguistics.

[271]  Ting Liu,et al.  CodeBERT: A Pre-Trained Model for Programming and Natural Languages , 2020, FINDINGS.

[272]  Noam Shazeer,et al.  GLU Variants Improve Transformer , 2020, ArXiv.

[273]  Tie-Yan Liu,et al.  On Layer Normalization in the Transformer Architecture , 2020, ICML.

[274]  Colin Raffel,et al.  How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.

[275]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[276]  Jeremy Blackburn,et al.  The Pushshift Reddit Dataset , 2020, ICWSM.

[277]  Boaz Barak,et al.  Deep double descent: where bigger models and more data hurt , 2019, ICLR.

[278]  Luke Zettlemoyer,et al.  ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[279]  Yejin Choi,et al.  PIQA: Reasoning about Physical Commonsense in Natural Language , 2019, AAAI.

[280]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[281]  Ashish Sabharwal,et al.  QASC: A Dataset for Question Answering via Sentence Composition , 2019, AAAI.

[282]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[283]  Samyam Rajbhandari,et al.  ZeRO: Memory optimizations Toward Training Trillion Parameter Models , 2019, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

[284]  Ronan Le Bras,et al.  WinoGrande , 2019, AAAI.

[285]  Sophie Rosset,et al.  DiaBLa: a corpus of bilingual spontaneous written dialogues for machine translation , 2019, Language Resources and Evaluation.

[286]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[287]  Philipp Koehn,et al.  Findings of the 2020 Conference on Machine Translation (WMT20) , 2020, WMT.

[288]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[289]  Jens Lehmann,et al.  LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia , 2019, SEMWEB.

[290]  Rico Sennrich,et al.  Root Mean Square Layer Normalization , 2019, NeurIPS.

[291]  Yanjun Ma,et al.  PaddlePaddle: An Open-Source Deep Learning Platform from Industrial Practice , 2019 .

[292]  Tom B. Brown,et al.  Fine-Tuning Language Models from Human Preferences , 2019, ArXiv.

[293]  M. Shoeybi,et al.  Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.

[294]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[295]  Marta R. Costa-jussà,et al.  Findings of the 2019 Conference on Machine Translation (WMT19) , 2019, WMT.

[296]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[297]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[298]  Seungwhan Moon,et al.  OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs , 2019, ACL.

[299]  Ben Goodrich,et al.  Assessing The Factual Accuracy of Generated Text , 2019, KDD.

[300]  Yejin Choi,et al.  MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms , 2019, NAACL.

[301]  Ali Farhadi,et al.  Defending Against Neural Fake News , 2019, NeurIPS.

[302]  Xiaodong Liu,et al.  Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[303]  Omer Levy,et al.  SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[304]  Ming-Wei Chang,et al.  BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions , 2019, NAACL.

[305]  Ali Farhadi,et al.  HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.

[306]  Ilya Sutskever,et al.  Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[307]  Gabriel Stanovsky,et al.  DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.

[308]  Xiaodong Liu,et al.  Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[309]  Quoc V. Le,et al.  GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.

[310]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[311]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[312]  Jonathan Berant,et al.  CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[313]  Rémi Louf,et al.  Transformers : State-ofthe-art Natural Language Processing , 2019 .

[314]  Yejin Choi,et al.  Social IQA: Commonsense Reasoning about Social Interactions , 2019, EMNLP 2019.

[315]  Mirella Lapata,et al.  Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.

[316]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[317]  Peter Clark,et al.  Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , 2018, EMNLP.

[318]  Nikhil R. Devanur,et al.  PipeDream: Fast and Efficient Pipeline Parallel DNN Training , 2018, ArXiv.

[319]  Quoc V. Le,et al.  A Simple Method for Commonsense Reasoning , 2018, ArXiv.

[320]  Sanja Fidler,et al.  VirtualHome: Simulating Household Activities Via Programs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[321]  Bhavana Dalvi,et al.  Tracking State Changes in Procedural Text: a Challenge Dataset and Models for Process Paragraph Comprehension , 2018, NAACL.

[322]  Qingxiang Wang,et al.  First Experiments with Neural Translation of Informal to Formal Mathematics , 2018, CICM.

[323]  Rachel Rudinger,et al.  Gender Bias in Coreference Resolution , 2018, NAACL.

[324]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[325]  Noam Shazeer,et al.  Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.

[326]  Oren Etzioni,et al.  Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , 2018, ArXiv.

[327]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[328]  Hao Wu,et al.  Mixed Precision Training , 2017, ICLR.

[329]  Ronald Kemker,et al.  Measuring Catastrophic Forgetting in Neural Networks , 2017, AAAI.

[330]  Pasquale Minervini,et al.  Convolutional 2D Knowledge Graph Embeddings , 2017, AAAI.

[331]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[332]  Frank Hutter,et al.  Fixing Weight Decay Regularization in Adam , 2017, ArXiv.

[333]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[334]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[335]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[336]  Wang Ling,et al.  Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems , 2017, ACL.

[337]  Eunsol Choi,et al.  TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[338]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[339]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[340]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[341]  Aidong Zhang,et al.  A Survey on Context Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[342]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[343]  Karin M. Verspoor,et al.  Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[344]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[345]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016 .

[346]  Sandro Pezzelle,et al.  The LAMBADA dataset: Word prediction requiring a broad discourse context , 2016, ACL.

[347]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[348]  Hannaneh Hajishirzi,et al.  MAWPS: A Math Word Problem Repository , 2016, NAACL.

[349]  Jason Weston,et al.  Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.

[350]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[351]  Tianqi Chen,et al.  Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.

[352]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[353]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[354]  Dan Roth,et al.  Solving General Arithmetic Word Problems , 2016, EMNLP.

[355]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[356]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[357]  Danqi Chen,et al.  Observed versus latent features for knowledge base and text inference , 2015, CVSC.

[358]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[359]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[360]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[361]  Fabian M. Suchanek,et al.  YAGO3: A Knowledge Base from Multilingual Wikipedias , 2015, CIDR.

[362]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[363]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[364]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[365]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[366]  Zornitsa Kozareva,et al.  SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , 2011, *SEMEVAL.

[367]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[368]  Lukás Burget,et al.  Recurrent Neural Network Based Language Modeling in Meeting Recognition , 2011, INTERSPEECH.

[369]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[370]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[371]  ChengXiang Zhai,et al.  Statistical Language Models for Information Retrieval , 2008, NAACL.

[372]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[373]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[374]  W. Bruce Croft,et al.  Statistical language modeling for information retrieval , 2006, Annu. Rev. Inf. Sci. Technol..

[375]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[376]  Jianfeng Gao,et al.  Introduction to the special issue on statistical language modeling , 2004, TALIP.

[377]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[378]  Noam Chomsky,et al.  The faculty of language: what is it, who has it, and how did it evolve? , 2002, Science.

[379]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[380]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[381]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[382]  Mary P. Harper,et al.  A Second-Order Hidden Markov Model for Part-of-Speech Tagging , 1999, ACL.

[383]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[384]  S. Pinker The language instinct : how the mind creates language , 1995 .

[385]  William A. Gale,et al.  Good-Turing Frequency Estimation Without Tears , 1995, J. Quant. Linguistics.

[386]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[387]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[388]  Lalit R. Bahl,et al.  A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[389]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[390]  Tad Hogg,et al.  Phase Transitions in Artificial Intelligence Systems , 1987, Artif. Intell..

[391]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[392]  C. Cordell Green,et al.  What Is Program Synthesis? , 1985, J. Autom. Reason..

[393]  Zohar Manna,et al.  Toward automatic program synthesis , 1971, Symposium on Semantics of Algorithmic Languages.

[394]  Herbert A. Simon,et al.  Experiments with a Heuristic Compiler , 1963, JACM.

[395]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.