Prompt Sapper: A LLM-Empowered Production Tool for Building AI Chains

The emergence of foundation models, such as large language models (LLMs) GPT-4 and text-to-image models DALL-E, has opened up numerous possibilities across various domains. People can now use natural language (i.e. prompts) to communicate with AI to perform tasks. While people can use foundation models through chatbots (e.g., ChatGPT), chat, regardless of the capabilities of the underlying models, is not a production tool for building reusable AI services. APIs like LangChain allow for LLM-based application development but require substantial programming knowledge, thus posing a barrier. To mitigate this, we propose the concept of AI chain and introduce the best principles and practices that have been accumulated in software engineering for decades into AI chain engineering, to systematise AI chain engineering methodology. We also develop a no-code integrated development environment, Prompt Sapper, which embodies these AI chain engineering principles and patterns naturally in the process of building AI chains, thereby improving the performance and quality of AI chains. With Prompt Sapper, AI chain engineers can compose prompt-based AI services on top of foundation models through chat-based requirement analysis and visual programming. Our user study evaluated and demonstrated the efficiency and correctness of Prompt Sapper.

[1]  Xiwei Xu,et al.  AI Chain on Large Language Model for Unsupervised Control Flow Graph Generation for Statically-Typed Partial Code , 2023, ArXiv.

[2]  Xiwei Xu,et al.  PCR-Chain: Partial Code Reuse Assisted by Hierarchical Chaining of Prompts on Frozen Copilot , 2023, 2023 IEEE/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion).

[3]  Henrique Pondé de Oliveira Pinto,et al.  GPT-4 Technical Report , 2023, 2303.08774.

[4]  Minjoon Seo,et al.  In-Context Instruction Learning , 2023, ArXiv.

[5]  Seyed Mehran Kazemi,et al.  LAMBADA: Backward Chaining for Automated Reasoning in Natural Language , 2022, ACL.

[6]  Noah A. Smith,et al.  Demystifying Prompts in Language Models via Perplexity Estimation , 2022, EMNLP.

[7]  Jimmy Ba,et al.  Large Language Models Are Human-Level Prompt Engineers , 2022, ICLR.

[8]  Graham Neubig,et al.  Language Models of Code are Few-Shot Commonsense Learners , 2022, EMNLP.

[9]  D. Klein,et al.  Re3: Generating Longer Stories With Recursive Reprompting and Revision , 2022, EMNLP.

[10]  Noah A. Smith,et al.  Measuring and Narrowing the Compositionality Gap in Language Models , 2022, ArXiv.

[11]  Mayee F. Chen,et al.  Ask Me Anything: A simple strategy for prompting language models , 2022, ICLR.

[12]  D. Fox,et al.  ProgPrompt: Generating Situated Robot Task Plans using Large Language Models , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[13]  M. Shanahan,et al.  Faithful Reasoning Using Large Language Models , 2022, ArXiv.

[14]  Edouard Grave,et al.  PEER: A Collaborative Language Model , 2022, ICLR.

[15]  S. Gu,et al.  Large Language Models are Zero-Shot Reasoners , 2022, NeurIPS.

[16]  I. Higgins,et al.  Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning , 2022, ICLR.

[17]  Carrie J. Cai,et al.  PromptMaker: Prompt-based Prototyping with Large Language Models , 2022, CHI Extended Abstracts.

[18]  S. Shalev-Shwartz,et al.  Standing on the Shoulders of Giant Frozen Language Models , 2022, ArXiv.

[19]  D. Schuurmans,et al.  Self-Consistency Improves Chain of Thought Reasoning in Language Models , 2022, ICLR.

[20]  Carrie J. Cai,et al.  PromptChainer: Chaining Large Language Model Prompts through Visual Programming , 2022, CHI Extended Abstracts.

[21]  Dale Schuurmans,et al.  Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.

[22]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Carrie J. Cai,et al.  AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts , 2021, CHI.

[24]  Yejin Choi,et al.  Reframing Instructional Prompts to GPTk’s Language , 2021, FINDINGS.

[25]  Michael S. Bernstein,et al.  On the Opportunities and Risks of Foundation Models , 2021, ArXiv.

[26]  Alec Radford,et al.  Zero-Shot Text-to-Image Generation , 2021, ICML.

[27]  D. Klein,et al.  Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.

[28]  K. Fordyce,et al.  Cooperative Principle , 2020, Pragmatics.

[29]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[30]  Shimon Y. Nof,et al.  Vipo: Spatial-Visual Programming with Functions for Robot-IoT Workflows , 2020, CHI.

[31]  Roman Weitschat,et al.  RAZER—A HRI for Visual Task-Level Programming and Intuitive Skill Parameterization , 2018, IEEE Robotics and Automation Letters.

[32]  Shijie Zhang,et al.  SAPPER , 2010 .

[33]  Noah Shinn,et al.  Reflexion: an autonomous agent with dynamic memory and self-reflection , 2023, ArXiv.

[34]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[35]  Jonas Voss,et al.  Figma: collaborative interface design tool , 2019 .

[36]  Raja Herold-Blasius,et al.  Prompts , 2019, Essener Beiträge zur Mathematikdidaktik.

[37]  Mark T True,et al.  Software Requirements , 2005 .

[38]  Alan F. Blackwell,et al.  A Cognitive Dimensions questionnaire optimised for users , 2000, PPIG.

[39]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .