On Optimal Caching and Model Multiplexing for Large Model Inference
暂无分享,去创建一个
[1] Nandan Thakur,et al. Evaluating Embedding APIs for Information Retrieval , 2023, ACL.
[2] James Y. Zou,et al. FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance , 2023, ArXiv.
[3] Zhi Rui Tam,et al. OpenAssistant Conversations - Democratizing Large Language Model Alignment , 2023, ArXiv.
[4] Marco Tulio Ribeiro,et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4 , 2023, ArXiv.
[5] E. Horvitz,et al. Capabilities of GPT-4 on Medical Challenge Problems , 2023, ArXiv.
[6] Nanyang Technological University,et al. A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT , 2023, ArXiv.
[7] Geoffrey Irving,et al. Accelerating Large Language Model Decoding with Speculative Sampling , 2023, ArXiv.
[8] Y. Matias,et al. Fast Inference from Transformers via Speculative Decoding , 2022, ICML.
[9] N. Karamchandani,et al. Regret-Optimal Online Caching for Adversarial and Stochastic Arrivals , 2022, VALUETOOLS.
[10] Dan Alistarh,et al. GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers , 2022, ArXiv.
[11] J. Dean,et al. A Review of Sparse Expert Models in Deep Learning , 2022, ArXiv.
[12] Doug Downey,et al. Embedding Recycling for Language Models , 2022, FINDINGS.
[13] J. Dean,et al. Emergent Abilities of Large Language Models , 2022, Trans. Mach. Learn. Res..
[14] Xi Victoria Lin,et al. OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.
[15] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[16] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[17] Shenghuang He,et al. A Flexible Multi-Task Model for BERT Serving , 2021, ACL.
[18] Qi Zhang,et al. Single-Layer Vision Transformers for More Accurate Early Exits with Less Overhead , 2021, Neural Networks.
[19] David R. So,et al. Carbon Emissions and Large Neural Network Training , 2021, ArXiv.
[20] Michael W. Mahoney,et al. A Survey of Quantization Methods for Efficient Neural Network Inference , 2021, Low-Power Computer Vision.
[21] Stuart J. Russell,et al. Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism , 2021, IEEE Transactions on Information Theory.
[22] Junaid Shuja,et al. Applying machine learning techniques for caching in next-generation edge networks: A comprehensive survey , 2021, J. Netw. Comput. Appl..
[23] Abhishek Sinha,et al. Online Caching with Optimal Switching Regret , 2021, 2021 IEEE International Symposium on Information Theory (ISIT).
[24] Zhuoran Yang,et al. Is Pessimism Provably Efficient for Offline RL? , 2020, ICML.
[25] Veselin Stoyanov,et al. General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference , 2020, FINDINGS.
[26] Yoav Shoham,et al. The Cost of Training NLP Models: A Concise Overview , 2020, ArXiv.
[27] Srinivas Shakkottai,et al. Learning to Cache and Caching to Learn: Regret Analysis of Caching Algorithms , 2020, IEEE/ACM Transactions on Networking.
[28] Wei-Cheng Chang,et al. Pre-training Tasks for Embedding-based Large-scale Retrieval , 2020, ICLR.
[29] Tom B. Brown,et al. Fine-Tuning Language Models from Human Preferences , 2019, ArXiv.
[30] Gang Feng,et al. Multi-Agent Reinforcement Learning for Efficient Content Caching in Mobile D2D Networks , 2019, IEEE Transactions on Wireless Communications.
[31] Tapani Ristaniemi,et al. Learn to Cache: Machine Learning for Network Edge Caching in the Big Data Era , 2018, IEEE Wireless Communications.
[32] Victor C. M. Leung,et al. Deep-Reinforcement-Learning-Based Optimization for Cache-Enabled Opportunistic Interference Alignment Wireless Networks , 2017, IEEE Transactions on Vehicular Technology.
[33] Angeliki Lazaridou,et al. The LAMBADA dataset: Word prediction requiring a broad discourse context , 2016, ACL.
[34] Björn Buchhold,et al. Semantic Search on Text and Knowledge Bases , 2016, Found. Trends Inf. Retr..
[35] Swadhesh Kumar,et al. An overview of modern cache memory and performance analysis of replacement policies , 2016, 2016 IEEE International Conference on Engineering and Technology (ICETECH).
[36] Hyokyung Bahn,et al. Web cache management based on the expected cost of web objects , 2005, Inf. Softw. Technol..
[37] Sang Lyul Min,et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies , 2001, IEEE Trans. Computers.
[38] Azer Bestavros,et al. Popularity-aware greedy dual-size Web proxy caching algorithms , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.
[39] Martin F. Arlitt,et al. Evaluating content management techniques for Web proxy caches , 2000, PERV.
[40] Jia Wang,et al. A survey of web caching schemes for the Internet , 1999, CCRV.
[41] William Stallings,et al. Operating Systems: Internals and Design Principles , 1991 .
[42] Jianqing Fan,et al. High-Dimensional Statistics , 2014 .
[43] K. Kavi. Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .