论文信息 - Mistral 7B - 字舞流文

Mistral 7B

We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.

Devendra Singh Chaplot | Diego de Las Casas | Teven Le Scao | Guillaume Lample | Thibaut Lavril | Timothée Lacroix | Marie-Anne Lachaux | A. Mensch | Thomas Wang | Lucile Saulnier | Albert Qiaochu Jiang | Alexandre Sablayrolles | Arthur Mensch | Chris Bamford | Florian Bressand | Gianna Lengyel | L'elio Renard Lavaud | Pierre Stock | William El Sayed

[1] Joseph E. Gonzalez,et al. Efficient Memory Management for Large Language Model Serving with PagedAttention , 2023, SOSP.

[2] Manish P Bhatt,et al. Code Llama: Open Foundation Models for Code , 2023, ArXiv.

[3] Eric Michael Smith,et al. Llama 2: Open Foundation and Fine-Tuned Chat Models , 2023, ArXiv.

[4] Michiel de Jong,et al. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints , 2023, EMNLP.

[5] Weizhu Chen,et al. AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models , 2023, NAACL-HLT.

[6] Naman Goyal,et al. LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.

[7] Quoc V. Le,et al. Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them , 2022, ACL.

[8] Daniel Y. Fu,et al. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness , 2022, NeurIPS.

[9] Mohammad Bavarian,et al. Training Verifiers to Solve Math Word Problems , 2021, ArXiv.

[10] Charles Sutton,et al. Program Synthesis with Large Language Models , 2021, ArXiv.

[11] Wojciech Zaremba,et al. Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[12] Dawn Song,et al. Measuring Mathematical Problem Solving With the MATH Dataset , 2021, NeurIPS Datasets and Benchmarks.

[13] Dawn Song,et al. Measuring Massive Multitask Language Understanding , 2020, ICLR.

[14] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.

[15] Yejin Choi,et al. PIQA: Reasoning about Physical Commonsense in Natural Language , 2019, AAAI.

[16] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[17] Ming-Wei Chang,et al. BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions , 2019, NAACL.

[18] Ali Farhadi,et al. HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.

[19] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[20] Eunsol Choi,et al. QuAC: Question Answering in Context , 2018, EMNLP.

[21] Peter Clark,et al. Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , 2018, EMNLP.

[22] Oren Etzioni,et al. Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , 2018, ArXiv.

[23] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[24] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[25] Lisa Anne Hendricks,et al. An empirical analysis of compute-optimal large language model training , 2022, NeurIPS.

[26] Yejin Choi,et al. An Adversarial Winograd Schema Challenge at Scale , 2019 .

[27] Jonathan Berant,et al. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.