论文信息 - Prompt Optimization of Large Language Model for Interactive Tasks without Gradient and Demonstrations

Prompt Optimization of Large Language Model for Interactive Tasks without Gradient and Demonstrations

Large language models (LLMs) have demonstrated remarkable language proficiency, but they face challenges when solving interactive tasks independently. Existing methods either rely on gradient access, which is often inaccessible in state-of-the-art LLMs like GPT-4, or necessitate diverse and high-quality in-context demonstrations. In this study, we propose LLM-PO, a novel approach that enables LLMs to address these tasks without gradient access or extensive demonstrations. The key idea is to maintain a text-based plan and ask LLMs to reflect on pros and cons of the current plan based on experience collected with it, to update the plan, and to collect more experiences with the new plan. Experiments on HotpotQA demonstrate that LLM-PO achieves higher or on par success rates compared to in-context learning (ICL) baselines while requiring less inference cost.

Lei Li | Siqi Ouyang

[1] Marco Tulio Ribeiro,et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4 , 2023, ArXiv.

[2] Luke Zettlemoyer,et al. Toolformer: Language Models Can Teach Themselves to Use Tools , 2023, NeurIPS.

[3] Olivier Sigaud,et al. Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning , 2023, ICML.

[4] I. Shafran,et al. ReAct: Synergizing Reasoning and Acting in Language Models , 2022, ICLR.

[5] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.

[6] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.

[7] Weizhu Chen,et al. What Makes Good In-Context Examples for GPT-3? , 2021, DEELIO.

[8] Matthew J. Hausknecht,et al. ALFWorld: Aligning Text and Embodied Environments for Interactive Learning , 2020, ICLR.

[9] Yoshua Bengio,et al. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.