Data Distributional Properties Drive Emergent In-Context Learning in Transformers
暂无分享,去创建一个
Andrew Kyle Lampinen | Pierre H. Richemond | Stephanie C. Y. Chan | Jane X. Wang | Adam Santoro | Felix Hill | J. Mcclelland | Aaditya Singh
[1] Andrew Kyle Lampinen,et al. Zipfian environments for Reinforcement Learning , 2022, CoLLAs.
[2] M. Lewis,et al. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? , 2022, EMNLP.
[3] Matt Gardner,et al. Impact of Pretraining Term Frequencies on Few-Shot Reasoning , 2022, ArXiv.
[4] S. Gu,et al. Can Wikipedia Help Offline Reinforcement Learning? , 2022, ArXiv.
[5] Sang Michael Xie,et al. An Explanation of In-context Learning as Implicit Bayesian Inference , 2021, ICLR.
[6] Ellie Pavlick,et al. Do Prompt-Based Models Really Understand the Meaning of Their Prompts? , 2021, NAACL.
[7] Stephen Clark,et al. Grounded Language Learning Fast and Slow , 2020, ICLR.
[8] J. Hopfield,et al. Large Associative Memory Problem in Neurobiology and Machine Learning , 2020, ICLR.
[9] David P. Kreil,et al. Hopfield Networks is All You Need , 2020, ICLR.
[10] Hinrich Schütze,et al. Placing language in an integrated understanding system: Next steps toward human-level performance in neural language models , 2020, Proceedings of the National Academy of Sciences.
[11] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[12] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[13] Joshua B. Tenenbaum,et al. The Omniglot challenge: a 3-year progress report , 2019, Current Opinion in Behavioral Sciences.
[14] Linda B. Smith,et al. The Developing Infant Creates a Curriculum for Statistical Learning , 2018, Trends in Cognitive Sciences.
[15] François Fleuret,et al. Not All Samples Are Created Equal: Deep Learning with Importance Sampling , 2018, ICML.
[16] J. Saffran,et al. Infant Statistical Learning , 2018, Annual review of psychology.
[17] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[18] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[19] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[20] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[21] James L. McClelland,et al. What Learning Systems do Intelligent Agents Need? Complementary Learning Systems Theory Updated , 2016, Trends in Cognitive Sciences.
[22] Daan Wierstra,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.
[23] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.
[24] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] S. Piantadosi. Zipf’s word frequency law in natural language: A critical review and future directions , 2014, Psychonomic Bulletin & Review.
[26] Jean-Charles Delvenne,et al. Burstiness and spreading on temporal networks , 2013, ArXiv.
[27] Filippo Menczer,et al. Modeling Statistical Properties of Written Text , 2009, PloS one.
[28] Adilson E. Motter,et al. Beyond Word Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of Words , 2009, PloS one.
[29] Taghi M. Khoshgoftaar,et al. Experimental perspectives on learning from imbalanced data , 2007, ICML '07.
[30] J-P Eckmann,et al. Hierarchical structures induce long-range dynamical correlations in written texts. , 2006, Proceedings of the National Academy of Sciences of the United States of America.
[31] Paul H. Garthwaite,et al. A Bayesian Mixture Model for Term Re-occurrence and Burstiness , 2005, CoNLL.
[32] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..
[33] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[34] James L. McClelland,et al. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.
[35] M. Neuts. The burstiness of point processes , 1993 .
[36] L. Squire. Memory and the hippocampus: a synthesis from findings with rats, monkeys, and humans. , 1992, Psychological review.
[37] David Yarowsky,et al. One Sense Per Discourse , 1992, HLT.
[38] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[39] George Kingsley Zipf,et al. Human behavior and the principle of least effort , 1949 .