Inference with Reference: Lossless Acceleration of Large Language Models
暂无分享,去创建一个
Furu Wei | Nan Yang | Linjun Yang | Rangan Majumder | Tao Ge | Binxing Jiao | Daxin Jiang | Liang Wang
[1] Naman Goyal,et al. LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.
[2] Geoffrey Irving,et al. Accelerating Large Language Model Decoding with Speculative Sampling , 2023, ArXiv.
[3] Dan Alistarh,et al. SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot , 2023, ICML.
[4] Luke Zettlemoyer,et al. The case for 4-bit precision: k-bit Inference Scaling Laws , 2022, ICML.
[5] Furu Wei,et al. Text Embeddings by Weakly-Supervised Contrastive Pre-training , 2022, ArXiv.
[6] Arun Tejasvi Chaganty,et al. RARR: Researching and Revising What Language Models Say, Using Language Models , 2022, 2210.08726.
[7] Furu Wei,et al. Lossless Acceleration for Seq2seq Generation with Aggressive Decoding , 2022, ArXiv.
[8] Furu Wei,et al. Lossless Speedup of Autoregressive Translation with Generalized Aggressive Decoding , 2022, arXiv.org.
[9] Houfeng Wang,et al. Instantaneous Grammatical Error Correction with Shallow Aggressive Decoding , 2021, ACL.
[10] Li Dong,et al. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers , 2020, NeurIPS.
[11] Furu Wei,et al. BERT-of-Theseus: Compressing BERT by Progressive Module Replacing , 2020, EMNLP.
[12] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[13] Jianfeng Gao,et al. A Human Generated MAchine Reading COmprehension Dataset , 2018 .