Universality and Limitations of Prompt Tuning
暂无分享,去创建一个
[1] Naman Goyal,et al. LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.
[2] D. Schuurmans,et al. What learning algorithm is in-context learning? Investigations with linear models , 2022, ICLR.
[3] Sanjeev Arora,et al. A Kernel-Based View of Language Model Fine-Tuning , 2022, ICML.
[4] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[5] Sashank J. Reddi,et al. Robust Training of Neural Networks using Scale Invariant Architectures , 2022, ICML.
[6] Colin Wei,et al. Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers , 2021, NeurIPS.
[7] Yoav Goldberg,et al. BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models , 2021, ACL.
[8] Yelong Shen,et al. LoRA: Low-Rank Adaptation of Large Language Models , 2021, ICLR.
[9] Sang Michael Xie,et al. Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning , 2021, NeurIPS.
[10] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.
[11] Kevin Scaman,et al. Lipschitz Normalization for Self-Attention Layers with Application to Graph Neural Networks , 2021, ICML.
[12] Andreas Loukas,et al. Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth , 2021, ICML.
[13] Samy Bengio,et al. Understanding deep learning (still) requires rethinking generalization , 2021, Commun. ACM.
[14] Armen Aghajanyan,et al. Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning , 2020, ACL.
[15] Andriy Mnih,et al. The Lipschitz Constant of Self-Attention , 2020, ICML.
[16] Joe Davison,et al. Compacter: Efficient Low-Rank Hypercomplex Adapter Layers , 2021, NeurIPS.
[17] P. Barceló,et al. Attention is Turing-Complete , 2021, J. Mach. Learn. Res..
[18] Percy Liang,et al. Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.
[19] Sashank J. Reddi,et al. Why are Adaptive Methods Good for Attention Models? , 2020, NeurIPS.
[20] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[21] Sashank J. Reddi,et al. Are Transformers universal approximators of sequence-to-sequence functions? , 2019, ICLR.
[22] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[23] David Duvenaud,et al. Invertible Residual Networks , 2018, ICML.
[24] Suvrit Sra,et al. Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity , 2018, NeurIPS.
[25] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[26] Matthias Hein,et al. Optimization Landscape and Expressivity of Deep CNNs , 2017, ICML.
[27] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[28] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[29] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[30] Philipp Koehn,et al. Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.
[31] Guang-Bin Huang,et al. Learning capability and storage capacity of two-hidden-layer feedforward networks , 2003, IEEE Trans. Neural Networks.
[32] Guang-Bin Huang,et al. Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions , 1998, IEEE Trans. Neural Networks.
[33] Masami Yamasaki,et al. The Lower Bound of the Capacity for a Neural Network with Multiple Hidden Layers , 1993 .
[34] Y. F. Huang,et al. Bounds on number of hidden neurons of multilayer perceptrons in classification and recognition , 1990, IEEE International Symposium on Circuits and Systems.