Benefits of Transformer: In-Context Learning in Linear Regression Tasks with Unstructured Data
暂无分享,去创建一个
[1] S. Sra,et al. Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context , 2023, ArXiv.
[2] Christos Thrampoulidis,et al. On the Optimization and Generalization of Multi-head Attention , 2023, ArXiv.
[3] Yu Huang,et al. In-Context Convergence of Transformers , 2023, ArXiv.
[4] Timothy Chu,et al. Fine-tune Language Models to Approximate Unbiased In-context Learning , 2023, ArXiv.
[5] S. Sra,et al. Linear attention is (maybe) all you need (to understand transformer optimization) , 2023, ICLR.
[6] Yiqi Wang,et al. LinRec: Linear Attention Mechanism for Long-term Sequential Recommender Systems , 2023, SIGIR.
[7] S. Mahadevan,et al. Zero-th Order Algorithm for Softmax Attention Optimization , 2023, ArXiv.
[8] Yeqi Gao,et al. In-Context Learning for Attention Scheme: from Single Softmax Regression to Multiple Softmax Regression via a Tensor Trick , 2023, ArXiv.
[9] P. Bartlett,et al. Trained Transformers Learn Linear Models In-Context , 2023, ArXiv.
[10] Song Mei,et al. Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection , 2023, ArXiv.
[11] A. Rawat,et al. On the Role of Attention in Prompt-tuning , 2023, ICML.
[12] Renjie Liao,et al. Memorization Capacity of Multi-Head Attention in Transformers , 2023, ArXiv.
[13] S. Sra,et al. Transformers learn to implement preconditioned gradient descent for in-context learning , 2023, NeurIPS.
[14] Jason D. Lee,et al. Reward Collapse in Aligning Large Language Models , 2023, ArXiv.
[15] Shuai Li,et al. The Closeness of In-Context Learning and Weight Shifting for Softmax Regression , 2023, ArXiv.
[16] M. Wang,et al. A Theoretical Understanding of shallow Vision Transformers: Learning, Generalization, and Sample Complexity , 2023, ICLR.
[17] Dimitris Papailiopoulos,et al. Transformers as Algorithms: Generalization and Stability in In-context Learning , 2023, ICML.
[18] Li Dong,et al. Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers , 2022, 2212.10559.
[19] A. Zhmoginov,et al. Transformers learn in-context by gradient descent , 2022, ICML.
[20] D. Schuurmans,et al. What learning algorithm is in-context learning? Investigations with linear models , 2022, ICLR.
[21] Michael E. Sander,et al. Vision Transformers provably learn spatial structure , 2022, Neural Information Processing Systems.
[22] Percy Liang,et al. What Can Transformers Learn In-Context? A Case Study of Simple Function Classes , 2022, NeurIPS.
[23] Lingpeng Kong,et al. Linear Complexity Randomized Self-attention Mechanism , 2022, ICML.
[24] I. Assent,et al. Generalized Classification of Satellite Image Time Series with Thermal Positional Encoding , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[25] Ding-Xuan Zhou,et al. Attention Enables Zero Approximation Error , 2022, ArXiv.
[26] Junjie Yan,et al. cosFormer: Rethinking Softmax in Attention , 2022, ICLR.
[27] Rickard Brüel Gabrielsson,et al. Rewiring with Positional Encodings for Graph Neural Networks , 2022, ArXiv.
[28] A. Schwing,et al. Masked-attention Mask Transformer for Universal Image Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Xuanjing Huang,et al. Mask Attention Networks: Rethinking and Strengthen Transformer , 2021, NAACL.
[30] Changyou Chen,et al. Repulsive Attention: Rethinking Multi-head Attention as Bayesian Inference , 2020, EMNLP.
[31] Rui Li,et al. Linear Attention Mechanism: An Efficient Attention for Semantic Segmentation , 2020, ArXiv.
[32] Qi Tian,et al. Polar Relative Positional Encoding for Video-Language Segmentation , 2020, IJCAI.
[33] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.
[34] Xiu-Shen Wei,et al. Bi-Modal Progressive Mask Attention for Fine-Grained Recognition , 2020, IEEE Transactions on Image Processing.
[35] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[36] Chris Quirk,et al. Novel positional encodings to enable tree-based transformers , 2019, NeurIPS.
[37] Fahad Shahbaz Khan,et al. Mask-Guided Attention Network for Occluded Pedestrian Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[38] Betty van Aken,et al. How Does BERT Answer Questions?: A Layer-Wise Analysis of Transformer Representations , 2019, CIKM.
[39] Shuai Yi,et al. Efficient Attention: Attention with Linear Complexities , 2018, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).
[40] Iasonas Kokkinos,et al. Segmentation-Aware Convolutional Networks Using Local Attention Masks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[41] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[42] Benoît Crabbé,et al. How Many Layers and Why? An Analysis of the Model Depth in Transformers , 2021, ACL.
[43] Aidong Zhang,et al. A Survey on Context Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.