Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
暂无分享,去创建一个
[1] D. Schuurmans,et al. What learning algorithm is in-context learning? Investigations with linear models , 2022, ICLR.
[2] Percy Liang,et al. What Can Transformers Learn In-Context? A Case Study of Simple Function Classes , 2022, NeurIPS.
[3] J. Schmidhuber,et al. The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention , 2022, ICML.
[4] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[5] Yejin Choi,et al. PIQA: Reasoning about Physical Commonsense in Natural Language , 2019, AAAI.
[6] Judith Tonhauser,et al. The CommitmentBank: Investigating projection in naturally occurring discourse , 2019 .
[7] M. Aizerman,et al. Theoretical foundation of potential functions method in pattern recognition , 2019 .
[8] Oren Etzioni,et al. Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , 2018, ArXiv.
[9] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[10] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.
[11] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[12] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[13] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.
[14] Marcus B. Perry,et al. The Exponentially Weighted Moving Average , 2010 .
[15] Bo Pang,et al. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.
[16] Bo Pang,et al. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.
[17] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .