论文信息 - Distilling Multi-Step Reasoning Capabilities of Large Language Models into Smaller Models via Semantic Decompositions

Distilling Multi-Step Reasoning Capabilities of Large Language Models into Smaller Models via Semantic Decompositions

Step-by-step reasoning approaches like chain-of-thought ( CoT ) have proved to be a very effective technique to induce reasoning capabilities in large language models. However, the success of the CoT approach depends primarily on model size, and often billion parameter-scale models are needed to get CoT to work. In this paper, we propose a knowledge distillation approach, that leverages the step-by-step CoT reasoning capabilities of larger models and distils these reasoning abilities into smaller models. Our approach D ECOMPOSI - TIONAL D ISTILLATION learns a semantic decomposition of the original problem into a sequence of subproblems and uses it to train two models: a) a problem decomposer that learns to decompose the complex reasoning problem into a sequence of simpler sub-problems and b) a problem solver that uses the intermediate subproblems to solve the overall problem. On a multi-step math word problem dataset (GSM8K), we boost the performance of GPT-2 variants up to 35% when distilled with our approach compared to CoT . We show that using our approach, it is possible to train a GPT-2-large model (775M) that can outperform a 10X larger GPT-3 (6B) model trained using CoT reasoning. Finally, we also demonstrate that our approach of problem decomposition can also be used as an alternative to CoT prompting, which boosts the GPT-3 performance by 40% compared to CoT prompts.

Mrinmaya Sachan | K. Shridhar | Alessandro Stolfo

[1] Manu Kapur,et al. Automatic Generation of Socratic Subquestions for Teaching Math Word Problems , 2022, EMNLP.

[2] Mrinmaya Sachan,et al. A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models , 2022, ACL.

[3] S. Gu,et al. Large Language Models Can Self-Improve , 2022, EMNLP.

[4] Xifeng Yan,et al. Explanations from Large Language Models Make Small Reasoners Better , 2022, ArXiv.

[5] Jacob Eisenstein,et al. Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model , 2022, ArXiv.

[6] D. Klein,et al. Learning by Distilling Context , 2022, ArXiv.

[7] Yuhuai Wu,et al. Solving Quantitative Reasoning Problems with Language Models , 2022, NeurIPS.

[8] J. Dean,et al. Emergent Abilities of Large Language Models , 2022, Trans. Mach. Learn. Res..

[9] Ronan Le Bras,et al. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , 2022, ArXiv.

[10] S. Gu,et al. Large Language Models are Zero-Shot Reasoners , 2022, NeurIPS.

[11] D. Schuurmans,et al. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models , 2022, ICLR.