From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning

Fine-tuning language models on tasks with instructions has demonstrated potential in facilitating zero-shot generalization to unseen tasks. In this paper, we introduce a straightforward yet effective method for enhancing instruction tuning by employing symbolic tasks. Compared to crowdsourced human tasks or model-generated tasks, symbolic tasks present a unique advantage as they can be easily generated in vast quantities, theoretically providing an infinite supply of high-quality training instances. To explore the potential of symbolic tasks, we carry out an extensive case study on the representative symbolic task of SQL execution. Empirical results on various benchmarks validate that the integration of SQL execution leads to significant improvements in zero-shot scenarios, particularly in table reasoning. Notably, our 3B model surpasses both the 175B GPT-3 and ChatGPT in zero-shot table reasoning across four benchmarks. Furthermore, experimental results on BBH (27 tasks) and MMLU (57 tasks) reveal that language models can be enhanced through symbolic tasks without compromising their generality. We hope that our paper serves as a catalyst, inspiring increased efforts to incorporate symbolic tasks in instruction tuning.

[1]  Quoc V. Le,et al.  The Flan Collection: Designing Data and Methods for Effective Instruction Tuning , 2023, ICML.

[2]  Fei Huang,et al.  Large Language Models are Versatile Decomposers: Decomposing Evidence and Questions for Table-based Reasoning , 2023, SIGIR.

[3]  Ashish Sabharwal,et al.  Specializing Smaller Language Models towards Multi-Step Reasoning , 2023, ICML.

[4]  Wenhu Chen Large Language Models are few(1)-shot Table Reasoners , 2022, FINDINGS.

[5]  Noah A. Smith,et al.  Self-Instruct: Aligning Language Model with Self Generated Instructions , 2022, ArXiv.

[6]  Ning Liu,et al.  Toward a Unified Framework for Unsupervised Complex Tabular Reasoning , 2022, 2023 IEEE 39th International Conference on Data Engineering (ICDE).

[7]  Preslav Nakov,et al.  PASTA: Table-Operations Aware Fact Verification via Sentence-Table Cloze Pre-training , 2022, EMNLP.

[8]  Graham Neubig,et al.  OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering , 2022, NAACL.

[9]  Yuhuai Wu,et al.  Insights into Pre-training via Simpler Synthetic Tasks , 2022, NeurIPS.

[10]  Gerard de Melo,et al.  Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , 2022, ArXiv.

[11]  Weizhu Chen,et al.  On the Advance of Making Language Models Better Reasoners , 2022, ArXiv.

[12]  Fan Zhou,et al.  TaCube: Pre-computing Data Cubes for Answering Numerical-Reasoning Questions over Tabular Data , 2022, EMNLP.

[13]  Ashish Sabharwal,et al.  Teaching Broad Reasoning Skills for Multi-Step QA by Generating Hard Contexts , 2022, EMNLP.

[14]  S. Muresan,et al.  Fine-tuned Language Models are Continual Learners , 2022, EMNLP.

[15]  Yuxuan Zhou,et al.  Table-based Fact Verification with Self-adaptive Mixture of Experts , 2022, FINDINGS.

[16]  Noah A. Smith,et al.  Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks , 2022, EMNLP.

[17]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[18]  Weizhu Chen,et al.  Reasoning Like Program Executors , 2022, EMNLP.

[19]  Qian Liu,et al.  LEMON: Language-Based Environment Manipulation via Execution-Guided Pre-training , 2022, EMNLP.

[20]  M. Lewis,et al.  MetaICL: Learning to Learn In Context , 2021, NAACL.

[21]  Alexander M. Rush,et al.  Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.

[22]  Quoc V. Le,et al.  Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.

[23]  Qian Liu,et al.  TAPEX: Table Pre-training via Learning a Neural SQL Executor , 2021, ICLR.

[24]  Hannaneh Hajishirzi,et al.  Cross-Task Generalization via Natural Language Crowdsourcing Instructions , 2021, ACL.

[25]  Mohammad Bavarian,et al.  Training Verifiers to Solve Math Word Problems , 2021, ArXiv.

[26]  Xiang Ren,et al.  CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP , 2021, EMNLP.

[27]  Navin Goyal,et al.  Are NLP Models really able to Solve Simple Math Word Problems? , 2021, NAACL.

[28]  Charles Foster,et al.  The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.

[29]  Dragomir R. Radev,et al.  GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing , 2020, ICLR.

[30]  Dawn Song,et al.  Measuring Massive Multitask Language Understanding , 2020, ICLR.

[31]  Alex Polozov,et al.  SCoRe: Pre-Training for Context Representation in Conversational Semantic Parsing , 2021, ICLR.

[32]  Ting Liu,et al.  Learn to Combine Linguistic and Symbolic Information for Table-based Fact Verification , 2020, COLING.

[33]  Fuzheng Zhang,et al.  Table Fact Verification with Structure-Aware Transformer , 2020, EMNLP.

[34]  Jordan Boyd-Graber,et al.  On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries , 2020, FINDINGS.

[35]  Quan Liu,et al.  Program Enhanced Fact Verification with Verbalization and Graph Attention Network , 2020, EMNLP.

[36]  Thomas Muller,et al.  Understanding tables with intermediate pre-training , 2020, FINDINGS.

[37]  Sida I. Wang,et al.  Grounded Adaptation for Zero-shot Executable Semantic Parsing , 2020, EMNLP.

[38]  Olatunji Ruwase,et al.  DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters , 2020, KDD.

[39]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[40]  Graham Neubig,et al.  TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data , 2020, ACL.

[41]  Thomas Muller,et al.  TaPas: Weakly Supervised Table Parsing via Pre-training , 2020, ACL.

[42]  Nan Duan,et al.  LogicalFactChecker: Leveraging Logical Operations for Fact Checking with Graph Module Network , 2020, ACL.

[43]  Jonathan Berant,et al.  Injecting Numerical Reasoning Skills into Language Models , 2020, ACL.

[44]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[45]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[46]  Wenhu Chen,et al.  TabFact: A Large-scale Dataset for Table-based Fact Verification , 2019, ICLR.

[47]  Danqi Chen,et al.  A Discrete Hard EM Approach for Weakly Supervised Question Answering , 2019, EMNLP.

[48]  Mirella Lapata,et al.  Learning Semantic Parsers from Denotations with Latent Structured Alignments and Abstract Programs , 2019, EMNLP.

[49]  Thomas Müller,et al.  Answering Conversational Questions on Structured Data without Logical Forms , 2019, EMNLP.

[50]  Luke S. Zettlemoyer,et al.  Iterative Search for Weakly Supervised Semantic Parsing , 2019, NAACL.

[51]  Dale Schuurmans,et al.  Learning to Generalize from Sparse and Underspecified Rewards , 2019, ICML.

[52]  Xiaocheng Feng,et al.  Knowledge-Aware Conversational Semantic Parsing Over Web Tables , 2018, NLPCC.

[53]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[54]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[55]  Chen Liang,et al.  Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing , 2018, NeurIPS.

[56]  Stefano Faralli,et al.  Building a Web-Scale Dependency-Parsed Corpus from CommonCrawl , 2017, LREC.

[57]  Richard Socher,et al.  Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[58]  Yuchen Zhang,et al.  Macro Grammars and Holistic Triggering for Efficient Semantic Parsing , 2017, EMNLP.

[59]  Ming-Wei Chang,et al.  Search-based Neural Structured Learning for Sequential Question Answering , 2017, ACL.

[60]  Martín Abadi,et al.  Learning a Natural Language Interface with Neural Programmer , 2016, ICLR.

[61]  Quoc V. Le,et al.  Neural Programmer: Inducing Latent Programs with Gradient Descent , 2015, ICLR.

[62]  Percy Liang,et al.  Compositional Semantic Parsing on Semi-Structured Tables , 2015, ACL.