A Survey of Large Language Models
暂无分享,去创建一个
Wayne Xin Zhao | Z. Chen | J. Nie | Ji-rong Wen | Junyi Li | Ruiyang Ren | Kun Zhou | Xinyu Tang | Jinhao Jiang | Beichen Zhang | Tianyi Tang | Yifan Du | Zikang Liu | Yupeng Hou | Junjie Zhang | Yushuo Chen | Peiyu Liu | Chen Yang | Zican Dong | Xiaolei Wang | Yingqian Min | Yifan Du | Yifan Li
[1] Oskar van der Wal,et al. Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling , 2023, ArXiv.
[2] Le Sun,et al. ChatGPT Is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models , 2023, LREC.
[3] Meysam Alizadeh,et al. ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks , 2023, ArXiv.
[4] Michael Färber,et al. unarXive 2022: All arXiv Publications Pre-Processed for NLP, Including Structured Full-Text and Citation Network , 2023, ArXiv.
[5] You Zhang,et al. ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge , 2023, ArXiv.
[6] R. Mak,et al. The utility of ChatGPT for cancer treatment information , 2023, medRxiv.
[7] Marco Tulio Ribeiro,et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4 , 2023, ArXiv.
[8] Haewoon Kwak,et al. Can we trust the evaluation on ChatGPT? , 2023, TRUSTNLP.
[9] E. Horvitz,et al. Capabilities of GPT-4 on Medical Challenge Problems , 2023, ArXiv.
[10] S. S. Gill,et al. Mind meets machine: Unravelling GPT-4's cognitive psychology , 2023, BenchCouncil Transactions on Benchmarks, Standards and Evaluations.
[11] Anton Firc,et al. On the Educational Impact of ChatGPT: Is Artificial Intelligence Ready to Obtain a University Degree? , 2023, ITiCSE.
[12] Wenpeng Yin,et al. Is Prompt All You Need? No. A Comprehensive and Broader View of Instruction Learning , 2023, ArXiv.
[13] Xuanjing Huang,et al. A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models , 2023, ArXiv.
[14] Zhongxiang Sun,et al. A Short Survey of Viewing Large Language Models in Legal Aspect , 2023, ArXiv.
[15] Henrique Pondé de Oliveira Pinto,et al. GPT-4 Technical Report , 2023, 2303.08774.
[16] Aixin Sun,et al. Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples! , 2023, ArXiv.
[17] Yuxiong He,et al. A Comprehensive Study on Post-Training Quantization for Large Language Models , 2023, ArXiv.
[18] Thomas Lukasiewicz,et al. Consistency Analysis of ChatGPT , 2023, ArXiv.
[19] J. Tenenbaum,et al. Planning with Large Language Models for Code Generation , 2023, ArXiv.
[20] U. V. Luxburg,et al. ChatGPT Participates in a Computer Science Exam , 2023, ArXiv.
[21] Chenfei Wu,et al. Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models , 2023, ArXiv.
[22] Xiaoqian Jiang,et al. Does Synthetic Data Generation of LLMs Help Clinical Text Mining? , 2023, ArXiv.
[23] Philip S. Yu,et al. A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT , 2023, ArXiv.
[24] Mehdi S. M. Sajjadi,et al. PaLM-E: An Embodied Multimodal Language Model , 2023, ICML.
[25] Cuiyun Gao,et al. On the Feasibility of Specialized Ability Extracting for Large Language Code Models , 2023, ArXiv.
[26] Shima Imani,et al. MathPrompter: Mathematical Reasoning using Large Language Models , 2023, ACL.
[27] Björn Schuller,et al. Will Affective Computing Emerge from Foundation Models and General AI? A First Evaluation on ChatGPT , 2023, ArXiv.
[28] A. Ananthaswamy. In AI, is bigger always better? , 2023, Nature.
[29] Xuanjing Huang,et al. How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks , 2023, ArXiv.
[30] Minlie Huang,et al. ChatGPT: potential, prospects, and limitations , 2023, Frontiers Inf. Technol. Electron. Eng..
[31] Naman Goyal,et al. LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.
[32] Li Dong,et al. Language Is Not All You Need: Aligning Perception with Language Models , 2023, NeurIPS.
[33] Xipeng Qiu,et al. Finding Supporting Examples for In-Context Learning , 2023, ArXiv.
[34] Michel Galley,et al. Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback , 2023, ArXiv.
[35] Dragomir R. Radev,et al. ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics , 2023, ArXiv.
[36] Przemyslaw Kazienko,et al. ChatGPT: Jack of all trades, master of none , 2023, SSRN Electronic Journal.
[37] Juhua Liu,et al. Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT , 2023, ArXiv.
[38] Nanyang Technological University,et al. A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT , 2023, ArXiv.
[39] Benjamin Van Durme,et al. Can GPT-3 Perform Statutory Reasoning? , 2023, ArXiv.
[40] Dilek Z. Hakkani-Tür,et al. Selective In-Context Data Augmentation for Intent Detection using Pointwise V-Information , 2023, EACL.
[41] Luke Zettlemoyer,et al. Toolformer: Language Models Can Teach Themselves to Use Tools , 2023, NeurIPS.
[42] I. Jerković,et al. ChatGPT-3.5 as writing assistance in students’ essays , 2023, Humanities and Social Sciences Communications.
[43] Michihiro Yasunaga,et al. Is ChatGPT a General-Purpose Natural Language Processing Task Solver? , 2023, EMNLP.
[44] Dan Su,et al. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity , 2023, IJCNLP.
[45] A. Borji. A Categorical Archive of ChatGPT Failures , 2023, ArXiv.
[46] Olivier Sigaud,et al. Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning , 2023, ArXiv.
[47] M. Kosinski. Theory of Mind Might Have Spontaneously Emerged in Large Language Models , 2023, 2302.02083.
[48] Alexander J. Smola,et al. Multimodal Chain-of-Thought Reasoning in Language Models , 2023, ArXiv.
[49] Minlie Huang,et al. Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models , 2023, ICML.
[50] Quoc V. Le,et al. The Flan Collection: Designing Data and Methods for Effective Instruction Tuning , 2023, ICML.
[51] Chris Callison-Burch,et al. Faithful Chain-of-Thought Reasoning , 2023, ArXiv.
[52] Ashish Sabharwal,et al. Specializing Smaller Language Models towards Multi-Step Reasoning , 2023, ICML.
[53] G. Kortemeyer. Could an artificial-intelligence agent pass an introductory physics course? , 2023, Physical Review Physics Education Research.
[54] O. Nov,et al. Putting ChatGPT's Medical Advice to the (Turing) Test , 2023, medRxiv.
[55] Ziyuan Wang,et al. How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection , 2023, ArXiv.
[56] Mathew Welsh. The End of Programming , 2022, Commun. ACM.
[57] Hiroaki Hayashi,et al. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing , 2021, ACM Comput. Surv..
[58] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ACM Comput. Surv..
[59] A. V. Podolskiy,et al. PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing , 2023, ArXiv.
[60] Kristin E. Hickman,et al. ChatGPT Goes to Law School , 2023, SSRN Electronic Journal.
[61] Dimitris Papailiopoulos,et al. Transformers as Algorithms: Generalization and Implicit Model Selection in In-context Learning , 2023, ArXiv.
[62] David Ifeoluwa Adelani,et al. The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset , 2023, NeurIPS.
[63] M. Ingrisch,et al. ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports , 2022, European radiology.
[64] Xi Victoria Lin,et al. OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization , 2022, ArXiv.
[65] Kang Min Yoo,et al. Prompt-Augmented Linear Probing: Scaling Beyond The Limit of Few-shot In-Context Learners , 2022, ArXiv.
[66] Noah A. Smith,et al. Self-Instruct: Aligning Language Model with Self Generated Instructions , 2022, ArXiv.
[67] Kai-Wei Chang,et al. A Survey of Deep Learning for Mathematical Reasoning , 2022, ArXiv.
[68] Daniel Fried,et al. Execution-Based Evaluation for Open-Domain Code Generation , 2022, ArXiv.
[69] Li Dong,et al. Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers , 2023, Annual Meeting of the Association for Computational Linguistics.
[70] K. Chang,et al. Towards Reasoning in Large Language Models: A Survey , 2022, ArXiv.
[71] Se-Young Yun,et al. Large Language Models Are Reasoning Teachers , 2022, ArXiv.
[72] Omer Levy,et al. Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor , 2022, ACL.
[73] Shizhu He,et al. Large Language Models are reasoners with Self-Verification , 2022, ArXiv.
[74] Fei Huang,et al. Reasoning with Language Model Prompting: A Survey , 2022, ArXiv.
[75] Teo Susnjak. ChatGPT: The End of Online Exam Integrity? , 2022, ArXiv.
[76] D. Roth,et al. Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale , 2022, ArXiv.
[77] Lucie Charlotte Magister,et al. Teaching Small Language Models to Reason , 2022, ArXiv.
[78] A. Zhmoginov,et al. Transformers learn in-context by gradient descent , 2022, ArXiv.
[79] Jonathan Berant,et al. Diverse Demonstrations Improve In-context Compositional Generalization , 2022, ArXiv.
[80] M. Shanahan. Talking About Large Language Models , 2022, Commun. ACM.
[81] Frank Schilder,et al. Legal Prompt Engineering for Multilingual Legal Judgement Prediction , 2022, ArXiv.
[82] Frank Schilder,et al. Legal Prompting: Teaching a Language Model to Think Like a Lawyer , 2022, ArXiv.
[83] Mrinmaya Sachan,et al. Distilling Multi-Step Reasoning Capabilities of Large Language Models into Smaller Models via Semantic Decompositions , 2022, arXiv.org.
[84] D. Schuurmans,et al. What learning algorithm is in-context learning? Investigations with linear models , 2022, ArXiv.
[85] Wayne Xin Zhao,et al. Dense Text Retrieval Based on Pretrained Language Models: A Survey , 2022, ACM Trans. Inf. Syst..
[86] Greg Durrett,et al. Complementary Explanations for Effective In-Context Learning , 2022, Annual Meeting of the Association for Computational Linguistics.
[87] Jamie Callan,et al. PAL: Program-aided Language Models , 2022, ICML.
[88] Luke Zettlemoyer,et al. DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation , 2022, ICML.
[89] Guillem Cucurull,et al. Galactica: A Large Language Model for Science , 2022, ArXiv.
[90] Christopher D. Manning,et al. Holistic Evaluation of Language Models , 2023, Annals of the New York Academy of Sciences.
[91] Alexander M. Rush,et al. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model , 2022, ArXiv.
[92] Yiming Zhang,et al. Active Example Selection for In-Context Learning , 2022, EMNLP.
[93] Dragomir R. Radev,et al. Crosslingual Generalization through Multitask Finetuning , 2022, ArXiv.
[94] Zheng Xin Yong,et al. What Language Model to Train if You Have One Million GPU Hours? , 2022, EMNLP.
[95] Yuhuai Wu,et al. Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs , 2022, ICLR.
[96] Andrew M. Dai,et al. Scaling Instruction-Finetuned Language Models , 2022, ArXiv.
[97] Quoc V. Le,et al. Transcending Scaling Laws with 0.1% Extra Compute , 2022, EMNLP.
[98] S. Gu,et al. Large Language Models Can Self-Improve , 2022, EMNLP.
[99] Minlie Huang,et al. Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task Generalization , 2022, EMNLP.
[100] Quoc V. Le,et al. Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them , 2022, ACL.
[101] Graham Neubig,et al. Language Models of Code are Few-Shot Commonsense Learners , 2022, EMNLP.
[102] D. Klein,et al. Re3: Generating Longer Stories With Recursive Reprompting and Revision , 2022, EMNLP.
[103] Alexander J. Smola,et al. Automatic Chain of Thought Prompting in Large Language Models , 2022, ICLR.
[104] Noah A. Smith,et al. Measuring and Narrowing the Compositionality Gap in Language Models , 2022, ArXiv.
[105] Hyung Won Chung,et al. Language Models are Multilingual Chain-of-Thought Reasoners , 2022, ICLR.
[106] P. Zhang,et al. GLM-130B: An Open Bilingual Pre-trained Model , 2022, ICLR.
[107] He He,et al. Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought , 2022, ICLR.
[108] Ashish Sabharwal,et al. Complexity-Based Prompting for Multi-Step Reasoning , 2022, ICLR.
[109] Lisa Anne Hendricks,et al. Improving alignment of dialogue agents via targeted human judgements , 2022, ArXiv.
[110] Tom B. Brown,et al. In-context Learning and Induction Heads , 2022, ArXiv.
[111] Keith B. Hall,et al. Promptagator: Few-shot Dense Retrieval From 8 Examples , 2022, ICLR.
[112] D. Fox,et al. ProgPrompt: Generating Situated Robot Task Plans using Large Language Models , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).
[113] Hui Su,et al. WeLM: A Well-Read Pre-trained Language Model for Chinese , 2022, ArXiv.
[114] Aman Madaan,et al. Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango , 2022, ArXiv.
[115] Peter R. Florence,et al. Code as Policies: Language Model Programs for Embodied Control , 2022, ArXiv.
[116] John J. Nay. Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans , 2022, SSRN Electronic Journal.
[117] Noah A. Smith,et al. Selective Annotation Makes Language Models Better Few-Shot Learners , 2022, ICLR.
[118] Tom B. Brown,et al. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned , 2022, ArXiv.
[119] M. Lewis,et al. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale , 2022, ArXiv.
[120] Xifeng Yan,et al. Limitations of Language Models in Arithmetic and Symbolic Induction , 2022, ACL.
[121] Jane A. Yu,et al. Few-shot Learning with Retrieval Augmented Language Models , 2022, J. Mach. Learn. Res..
[122] Jack G. M. FitzGerald,et al. AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model , 2022, ArXiv.
[123] Percy Liang,et al. What Can Transformers Learn In-Context? A Case Study of Simple Function Classes , 2022, NeurIPS.
[124] P. Bhattacharyya,et al. ScienceQA: a novel resource for question answering on scholarly articles , 2022, International Journal on Digital Libraries.
[125] O. Winther,et al. Can large language models reason about medical questions? , 2022, Patterns.
[126] Yuhuai Wu,et al. Exploring Length Generalization in Large Language Models , 2022, NeurIPS.
[127] Shannon L. Spruit,et al. No Language Left Behind: Scaling Human-Centered Machine Translation , 2022, ArXiv.
[128] D. Schuurmans,et al. Rationale-Augmented Ensembles in Language Models , 2022, ArXiv.
[129] Yuhuai Wu,et al. Solving Quantitative Reasoning Problems with Language Models , 2022, NeurIPS.
[130] Kang Min Yoo,et al. Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator , 2022, ArXiv.
[131] J. Dean,et al. Emergent Abilities of Large Language Models , 2022, Trans. Mach. Learn. Res..
[132] Wayne Xin Zhao,et al. JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding , 2022, KDD.
[133] Gerard de Melo,et al. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , 2022, ArXiv.
[134] Weizhu Chen,et al. On the Advance of Making Language Models Better Reasoners , 2022, ArXiv.
[135] Daniel Y. Fu,et al. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness , 2022, NeurIPS.
[136] Markus N. Rabe,et al. Autoformalization with Large Language Models , 2022, NeurIPS.
[137] S. Gu,et al. Large Language Models are Zero-Shot Reasoners , 2022, NeurIPS.
[138] Yao Zhao,et al. TALM: Tool Augmented Language Models , 2022, ArXiv.
[139] Yuhuai Wu,et al. Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers , 2022, NeurIPS.
[140] Tom B. Brown,et al. Scaling Laws and Interpretability of Learning from Repeated Data , 2022, ArXiv.
[141] D. Schuurmans,et al. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models , 2022, ICLR.
[142] Hyung Won Chung,et al. UL2: Unifying Language Learning Paradigms , 2022, ICLR.
[143] Lawrence C. McAfee,et al. Reducing Activation Recomputation in Large Transformer Models , 2022, MLSys.
[144] Xi Victoria Lin,et al. OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.
[145] Kyunghyun Cho,et al. On the Effect of Pretraining Corpora on In-context Learning by a Large-scale Language Model , 2022, NAACL.
[146] Andrew Kyle Lampinen,et al. Data Distributional Properties Drive Emergent In-Context Learning in Transformers , 2022, NeurIPS.
[147] Noah A. Smith,et al. Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks , 2022, EMNLP.
[148] Stella Rose Biderman,et al. GPT-NeoX-20B: An Open-Source Autoregressive Language Model , 2022, BIGSCIENCE.
[149] Hyung Won Chung,et al. What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? , 2022, ICML.
[150] Tom B. Brown,et al. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback , 2022, ArXiv.
[151] Sida I. Wang,et al. InCoder: A Generative Model for Code Infilling and Synthesis , 2022, ICLR.
[152] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[153] S. Levine,et al. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances , 2022, CoRL.
[154] Lisa Anne Hendricks,et al. Training Compute-Optimal Large Language Models , 2022, ArXiv.
[155] S. Savarese,et al. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis , 2022, ICLR.
[156] Jacob Menick,et al. Teaching language models to support answers with verified quotes , 2022, ArXiv.
[157] D. Schuurmans,et al. Self-Consistency Improves Chain of Thought Reasoning in Language Models , 2022, ICLR.
[158] Qun Liu,et al. Compression of Generative Pre-trained Language Models via Quantization , 2022, ACL.
[159] Huan Sun,et al. Iteratively Prompt Pre-trained Language Models for Chain of Thought , 2022, EMNLP.
[160] Angeliki Lazaridou,et al. Internet-augmented language models through few-shot prompting for open-domain question answering , 2022, ArXiv.
[161] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[162] Li Dong,et al. DeepNet: Scaling Transformers to 1, 000 Layers , 2022, ArXiv.
[163] Frank F. Xu,et al. A systematic evaluation of large language models of code , 2022, MAPS@PLDI.
[164] M. Lewis,et al. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? , 2022, Conference on Empirical Methods in Natural Language Processing.
[165] Florian Tramèr,et al. Quantifying Memorization Across Neural Language Models , 2022, ICLR.
[166] Colin Raffel,et al. Deduplicating Training Data Mitigates Privacy Risks in Language Models , 2022, ICML.
[167] David Bau,et al. Locating and Editing Factual Associations in GPT , 2022, NeurIPS.
[168] Cherepanov,et al. Competition-level code generation with AlphaCode , 2022, Science.
[169] Geoffrey Irving,et al. Red Teaming Language Models with Language Models , 2022, EMNLP.
[170] Jesse Michael Han,et al. Formal Mathematics Statement Curriculum Learning , 2022, ICLR.
[171] Alexander M. Rush,et al. PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts , 2022, ACL.
[172] Blake A. Hechtman,et al. Unified Scaling Laws for Routed Language Models , 2022, ICML.
[173] Orhan Firat,et al. Examining Scaling and Transfer of Language Model Architectures for Machine Translation , 2022, ICML.
[174] Reza Yazdani Aminabadi,et al. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model , 2022, ArXiv.
[175] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.
[176] Joseph Gonzalez,et al. Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning , 2022, OSDI.
[177] Weizhu Chen,et al. Reasoning Like Program Executors , 2022, EMNLP.
[178] Renelito Delos Santos,et al. LaMDA: Language Models for Dialog Applications , 2022, ArXiv.
[179] P. Abbeel,et al. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents , 2022, ICML.
[180] Niket Tandon,et al. Memory-assisted prompt editing to improve GPT-3 after deployment , 2022, EMNLP.
[181] Dragomir R. Radev,et al. UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models , 2022, EMNLP.
[182] Jonathan Berant,et al. Learning To Retrieve Prompts for In-Context Learning , 2021, NAACL.
[183] Quoc V. Le,et al. GLaM: Efficient Scaling of Language Models with Mixture-of-Experts , 2021, ICML.
[184] Diego de Las Casas,et al. Improving language models by retrieving from trillions of tokens , 2021, ICML.
[185] Sanket Vaibhav Mehta,et al. ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning , 2021, ArXiv.
[186] Sang Michael Xie,et al. An Explanation of In-context Learning as Implicit Bayesian Inference , 2021, ICLR.
[187] M. Lewis,et al. MetaICL: Learning to Learn In Context , 2021, NAACL.
[188] Alexander M. Rush,et al. Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.
[189] Owain Evans,et al. TruthfulQA: Measuring How Models Mimic Human Falsehoods , 2021, ACL.
[190] Quoc V. Le,et al. Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.
[191] Jesse Michael Han,et al. MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics , 2021, ICLR.
[192] Noah A. Smith,et al. Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation , 2021, ICLR.
[193] Wayne Xin Zhao,et al. Complex Knowledge Base Question Answering: A Survey , 2021, IEEE Transactions on Knowledge and Data Engineering.
[194] Nicholas Carlini,et al. Deduplicating Training Data Makes Language Models Better , 2021, ACL.
[195] Marc'Aurelio Ranzato,et al. The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation , 2021, TACL.
[196] Yang You,et al. Tesseract: Parallelize the Tensor Parallelism Efficiently , 2021, ICPP.
[197] S. Riedel,et al. Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity , 2021, ACL.
[198] Hannaneh Hajishirzi,et al. Cross-Task Generalization via Natural Language Crowdsourcing Instructions , 2021, ACL.
[199] Li Dong,et al. Knowledge Neurons in Pretrained Transformers , 2021, ACL.
[200] Weizhu Chen,et al. What Makes Good In-Context Examples for GPT-3? , 2021, DEELIO.
[201] Liangming Pan,et al. KQA Pro: A Dataset with Explicit Compositional Programs for Complex Question Answering over Knowledge Base , 2020, ACL.
[202] Philipp Koehn,et al. Findings of the 2022 Conference on Machine Translation (WMT22) , 2022, WMT.
[203] Yuzhong Qu,et al. Logical Form Generation via Multi-task Learning for Complex Question Answering over Knowledge Bases , 2022, COLING.
[204] Chae-Gyun Lim,et al. Does GPT-3 Generate Empathetic Dialogues? A Novel In-Context Example Selection Method and Automatic Evaluation Metric for Empathetic Dialogue Generation , 2022, COLING.
[205] Junyuan Shang,et al. ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation , 2021, ArXiv.
[206] Jeff Wu,et al. WebGPT: Browser-assisted question-answering with human feedback , 2021, ArXiv.
[207] Po-Sen Huang,et al. Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.
[208] Dario Amodei,et al. A General Language Assistant as a Laboratory for Alignment , 2021, ArXiv.
[209] David Bieber,et al. Show Your Work: Scratchpads for Intermediate Computation with Language Models , 2021, ArXiv.
[210] Yang You,et al. Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training , 2021, ArXiv.
[211] Xiaodong Yi,et al. OneFlow: Redesign the Distributed Deep Learning Framework from Scratch , 2021, ArXiv.
[212] Mohammad Bavarian,et al. Training Verifiers to Solve Math Word Problems , 2021, ArXiv.
[213] Liang Xu,et al. Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning , 2021, ArXiv.
[214] Jan Leike,et al. Recursively Summarizing Books with Human Feedback , 2021, ArXiv.
[215] Kyungduk Kim,et al. What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers , 2021, EMNLP.
[216] Charles Sutton,et al. Program Synthesis with Large Language Models , 2021, ArXiv.
[217] Silvio Savarese,et al. BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments , 2021, CoRL.
[218] Wojciech Zaremba,et al. Evaluating Large Language Models Trained on Code , 2021, ArXiv.
[219] Hao Tian,et al. ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation , 2021, ArXiv.
[220] Zhiyuan Liu,et al. CPM-2: Large-scale Cost-effective Pre-trained Language Models , 2021, AI Open.
[221] Yang You,et al. Maximizing Parallelism in Distributed Training for Huge Neural Networks , 2021, ArXiv.
[222] Yang You,et al. Sequence Parallelism: Long Sequence Training from System Perspective , 2021, ACL.
[223] Chang Zhou,et al. CogView: Mastering Text-to-Image Generation via Transformers , 2021, NeurIPS.
[224] Ji-Rong Wen,et al. Pretrained Language Models for Text Generation: A Survey , 2021, ArXiv.
[225] Dawn Song,et al. Measuring Coding Challenge Competence With APPS , 2021, NeurIPS Datasets and Benchmarks.
[226] Kaisheng M. Wang,et al. PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation , 2021, ArXiv.
[227] Jianlin Su,et al. RoFormer: Enhanced Transformer with Rotary Position Embedding , 2021, Neurocomputing.
[228] Xiang Ren,et al. CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP , 2021, EMNLP.
[229] Oyvind Tafjord,et al. Explaining Answers with Entailment Trees , 2021, EMNLP.
[230] Prateek Yadav,et al. ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning , 2021, EMNLP.
[231] Chaoyu Gong,et al. An Efficient 2D Method for Training Super-Large Deep Learning Models , 2021, ArXiv.
[232] Amar Phanishayee,et al. Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM , 2021, SC21: International Conference for High Performance Computing, Networking, Storage and Analysis.
[233] Tom Everitt,et al. Alignment of Language Agents , 2021, ArXiv.
[234] Yejin Choi,et al. NaturalProofs: Mathematical Theorem Proving in Natural Language , 2021, NeurIPS Datasets and Benchmarks.
[235] Zhilin Yang,et al. FastMoE: A Fast Mixture-of-Expert Training System , 2021, ArXiv.
[236] Stella Biderman,et al. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow , 2021 .
[237] Navin Goyal,et al. Are NLP Models really able to Solve Simple Math Word Problems? , 2021, NAACL.
[238] Roy Schwartz,et al. Random Feature Attention , 2021, ICLR.
[239] Hyung Won Chung,et al. Do Transformer Modifications Transfer Across Implementations and Applications? , 2021, EMNLP.
[240] D. Klein,et al. Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.
[241] Miles Brundage,et al. Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models , 2021, ArXiv.
[242] Sonal Gupta,et al. Muppet: Massive Multi-task Representations with Pre-Finetuning , 2021, EMNLP.
[243] Noam M. Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, J. Mach. Learn. Res..
[244] Jonathan Berant,et al. Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies , 2021, Transactions of the Association for Computational Linguistics.
[245] Charles Foster,et al. The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.
[246] Peter Clark,et al. ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language , 2020, FINDINGS.
[247] Colin Raffel,et al. Extracting Training Data from Large Language Models , 2020, USENIX Security Symposium.
[248] Brian M. Sadler,et al. Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases , 2020, WWW.
[249] Colin Raffel,et al. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2020, NAACL.
[250] Dawn Song,et al. Measuring Massive Multitask Language Understanding , 2020, ICLR.
[251] Joachim Daiber,et al. MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering , 2020, Transactions of the Association for Computational Linguistics.
[252] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[253] Mary Williamson,et al. Recipes for Building an Open-Domain Chatbot , 2020, EACL.
[254] Markus Freitag,et al. Findings of the 2021 Conference on Machine Translation (WMT21) , 2021, WMT.
[255] Yejin Choi,et al. proScript: Partially Ordered Scripts Generation , 2021, EMNLP.
[256] Yang You,et al. PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management , 2021, ArXiv.
[257] Claire Cardie,et al. WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization , 2020, FINDINGS.
[258] Samuel R. Bowman,et al. CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models , 2020, EMNLP.
[259] Ilya Sutskever,et al. Generative Language Modeling for Automated Theorem Proving , 2020, ArXiv.
[260] Ryan J. Lowe,et al. Learning to summarize from human feedback , 2020, NeurIPS 2020.
[261] Olatunji Ruwase,et al. DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters , 2020, KDD.
[262] M. Zaheer,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[263] Ming-Wei Chang,et al. Retrieval Augmented Language Model Pre-Training , 2020, ICML.
[264] Keh-Yih Su,et al. A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers , 2020, ACL.
[265] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[266] Fabio Petroni,et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.
[267] Ryan McDonald,et al. On Faithfulness and Factuality in Abstractive Summarization , 2020, ACL.
[268] Dian Yu,et al. CLUE: A Chinese Language Understanding Evaluation Benchmark , 2020, COLING.
[269] Xipeng Qiu,et al. Pre-trained models for natural language processing: A survey , 2020, Science China Technological Sciences.
[270] Eunsol Choi,et al. TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages , 2020, Transactions of the Association for Computational Linguistics.
[271] Ting Liu,et al. CodeBERT: A Pre-Trained Model for Programming and Natural Languages , 2020, FINDINGS.
[272] Noam Shazeer,et al. GLU Variants Improve Transformer , 2020, ArXiv.
[273] Tie-Yan Liu,et al. On Layer Normalization in the Transformer Architecture , 2020, ICML.
[274] Colin Raffel,et al. How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.
[275] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[276] Jeremy Blackburn,et al. The Pushshift Reddit Dataset , 2020, ICWSM.
[277] Boaz Barak,et al. Deep double descent: where bigger models and more data hurt , 2019, ICLR.
[278] Luke Zettlemoyer,et al. ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[279] Yejin Choi,et al. PIQA: Reasoning about Physical Commonsense in Natural Language , 2019, AAAI.
[280] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[281] Ashish Sabharwal,et al. QASC: A Dataset for Question Answering via Sentence Composition , 2019, AAAI.
[282] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[283] Samyam Rajbhandari,et al. ZeRO: Memory optimizations Toward Training Trillion Parameter Models , 2019, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[284] Ronan Le Bras,et al. WinoGrande , 2019, AAAI.
[285] Sophie Rosset,et al. DiaBLa: a corpus of bilingual spontaneous written dialogues for machine translation , 2019, Language Resources and Evaluation.
[286] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.
[287] Philipp Koehn,et al. Findings of the 2020 Conference on Machine Translation (WMT20) , 2020, WMT.
[288] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[289] Jens Lehmann,et al. LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia , 2019, SEMWEB.
[290] Rico Sennrich,et al. Root Mean Square Layer Normalization , 2019, NeurIPS.
[291] Yanjun Ma,et al. PaddlePaddle: An Open-Source Deep Learning Platform from Industrial Practice , 2019 .
[292] Tom B. Brown,et al. Fine-Tuning Language Models from Human Preferences , 2019, ArXiv.
[293] M. Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[294] Sebastian Riedel,et al. Language Models as Knowledge Bases? , 2019, EMNLP.
[295] Marta R. Costa-jussà,et al. Findings of the 2019 Conference on Machine Translation (WMT19) , 2019, WMT.
[296] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.
[297] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[298] Seungwhan Moon,et al. OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs , 2019, ACL.
[299] Ben Goodrich,et al. Assessing The Factual Accuracy of Generated Text , 2019, KDD.
[300] Yejin Choi,et al. MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms , 2019, NAACL.
[301] Ali Farhadi,et al. Defending Against Neural Fake News , 2019, NeurIPS.
[302] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.
[303] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[304] Ming-Wei Chang,et al. BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions , 2019, NAACL.
[305] Ali Farhadi,et al. HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.
[306] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[307] Gabriel Stanovsky,et al. DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.
[308] Xiaodong Liu,et al. Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.
[309] Quoc V. Le,et al. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.
[310] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[311] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[312] Jonathan Berant,et al. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.
[313] Rémi Louf,et al. Transformers : State-ofthe-art Natural Language Processing , 2019 .
[314] Yejin Choi,et al. Social IQA: Commonsense Reasoning about Social Interactions , 2019, EMNLP 2019.
[315] Mirella Lapata,et al. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.
[316] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[317] Peter Clark,et al. Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , 2018, EMNLP.
[318] Nikhil R. Devanur,et al. PipeDream: Fast and Efficient Pipeline Parallel DNN Training , 2018, ArXiv.
[319] Quoc V. Le,et al. A Simple Method for Commonsense Reasoning , 2018, ArXiv.
[320] Sanja Fidler,et al. VirtualHome: Simulating Household Activities Via Programs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[321] Bhavana Dalvi,et al. Tracking State Changes in Procedural Text: a Challenge Dataset and Models for Process Paragraph Comprehension , 2018, NAACL.
[322] Qingxiang Wang,et al. First Experiments with Neural Translation of Informal to Formal Mathematics , 2018, CICM.
[323] Rachel Rudinger,et al. Gender Bias in Coreference Resolution , 2018, NAACL.
[324] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[325] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[326] Oren Etzioni,et al. Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , 2018, ArXiv.
[327] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[328] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[329] Ronald Kemker,et al. Measuring Catastrophic Forgetting in Neural Networks , 2017, AAAI.
[330] Pasquale Minervini,et al. Convolutional 2D Knowledge Graph Embeddings , 2017, AAAI.
[331] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[332] Frank Hutter,et al. Fixing Weight Decay Regularization in Adam , 2017, ArXiv.
[333] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[334] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[335] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[336] Wang Ling,et al. Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems , 2017, ACL.
[337] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.
[338] Jason Weston,et al. Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.
[339] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[340] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[341] Aidong Zhang,et al. A Survey on Context Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.
[342] Jianfeng Gao,et al. A Human Generated MAchine Reading COmprehension Dataset , 2018 .
[343] Karin M. Verspoor,et al. Findings of the 2016 Conference on Machine Translation , 2016, WMT.
[344] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[345] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[346] Sandro Pezzelle,et al. The LAMBADA dataset: Word prediction requiring a broad discourse context , 2016, ACL.
[347] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[348] Hannaneh Hajishirzi,et al. MAWPS: A Math Word Problem Repository , 2016, NAACL.
[349] Jason Weston,et al. Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.
[350] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[351] Tianqi Chen,et al. Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.
[352] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.
[353] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[354] Dan Roth,et al. Solving General Arithmetic Word Problems , 2016, EMNLP.
[355] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[356] Jason Weston,et al. A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.
[357] Danqi Chen,et al. Observed versus latent features for knowledge base and text inference , 2015, CVSC.
[358] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[359] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[360] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[361] Fabian M. Suchanek,et al. YAGO3: A Knowledge Base from Multilingual Wikipedias , 2015, CIDR.
[362] Philipp Koehn,et al. Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.
[363] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[364] Andrew Chou,et al. Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.
[365] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[366] Zornitsa Kozareva,et al. SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , 2011, *SEMEVAL.
[367] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..
[368] Lukás Burget,et al. Recurrent Neural Network Based Language Modeling in Meeting Recognition , 2011, INTERSPEECH.
[369] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[370] Praveen Paritosh,et al. Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.
[371] ChengXiang Zhai,et al. Statistical Language Models for Information Retrieval , 2008, NAACL.
[372] Thorsten Brants,et al. Large Language Models in Machine Translation , 2007, EMNLP.
[373] Gerhard Weikum,et al. WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .
[374] W. Bruce Croft,et al. Statistical language modeling for information retrieval , 2006, Annu. Rev. Inf. Sci. Technol..
[375] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[376] Jianfeng Gao,et al. Introduction to the special issue on statistical language modeling , 2004, TALIP.
[377] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[378] Noam Chomsky,et al. The faculty of language: what is it, who has it, and how did it evolve? , 2002, Science.
[379] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.
[380] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[381] R. Rosenfeld,et al. Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.
[382] Mary P. Harper,et al. A Second-Order Hidden Markov Model for Part-of-Speech Tagging , 1999, ACL.
[383] Frederick Jelinek,et al. Statistical methods for speech recognition , 1997 .
[384] S. Pinker. The language instinct : how the mind creates language , 1995 .
[385] William A. Gale,et al. Good-Turing Frequency Estimation Without Tears , 1995, J. Quant. Linguistics.
[386] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[387] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[388] Lalit R. Bahl,et al. A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..
[389] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .
[390] Tad Hogg,et al. Phase Transitions in Artificial Intelligence Systems , 1987, Artif. Intell..
[391] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..
[392] C. Cordell Green,et al. What Is Program Synthesis? , 1985, J. Autom. Reason..
[393] Zohar Manna,et al. Toward automatic program synthesis , 1971, Symposium on Semantics of Algorithmic Languages.
[394] Herbert A. Simon,et al. Experiments with a Heuristic Compiler , 1963, JACM.
[395] A. M. Turing,et al. Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.