论文信息 - ToTTo: A Controlled Table-To-Text Generation Dataset - 字舞流文

ToTTo: A Controlled Table-To-Text Generation Dataset

We present ToTTo, an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. To obtain generated targets that are natural but also faithful to the source table, we introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia. We present systematic analyses of our dataset and annotation process as well as results achieved by several state-of-the-art baselines. While usually fluent, existing methods often hallucinate phrases that are not supported by the table, suggesting that this dataset can serve as a useful research benchmark for high-precision conditional text generation.

Diyi Yang | Manaal Faruqui | Bhuwan Dhingra | Ankur P. Parikh | Dipanjan Das | Sebastian Gehrmann | Xuezhi Wang | Xuezhi Wang | Bhuwan Dhingra | Diyi Yang | Sebastian Gehrmann | Dipanjan Das | Manaal Faruqui

[1] Mirella Lapata,et al. Bootstrapping Generators from Noisy Data , 2018, NAACL.

[2] Gabriel Stanovsky,et al. DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.

[3] Richard Socher,et al. Neural Text Summarization: A Critical Evaluation , 2019, EMNLP.

[4] Alexander M. Rush,et al. Challenges in Data-to-Document Generation , 2017, EMNLP.

[5] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .

[6] Alexander M. Rush,et al. End-to-End Content and Plan Selection for Data-to-Text Generation , 2018, INLG.

[7] Percy Liang,et al. Generating Sentences by Editing Prototypes , 2017, TACL.

[8] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[9] Percy Liang,et al. Compositional Semantic Parsing on Semi-Structured Tables , 2015, ACL.

[10] David Vandyke,et al. Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[11] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[12] Kenton Lee,et al. Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension , 2019, EMNLP.

[13] Wenhu Chen,et al. Logical Natural Language Generation from Open-Domain Tables , 2020, ACL.

[14] Shashi Narayan,et al. Creating Training Corpora for NLG Micro-Planners , 2017, ACL.

[15] Rachel Rudinger,et al. Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.

[16] Oliver Lemon,et al. Crowd-sourcing NLG Data: Pictures Elicit Better Data. , 2016, INLG.

[17] Verena Rieser,et al. The E2E Dataset: New Challenges For End-to-End Generation , 2017, SIGDIAL Conference.

[18] Thibault Sellam,et al. BLEURT: Learning Robust Metrics for Text Generation , 2020, ACL.

[19] Ankur P. Parikh,et al. Sticking to the Facts: Confident Decoding for Faithful Data-to-Text Generation , 2019, ArXiv.

[20] Cong Yu,et al. Automatically Generating Interesting Facts from Wikipedia Tables , 2019, SIGMOD Conference.

[21] David Grangier,et al. Neural Text Generation from Structured Data with Application to the Biography Domain , 2016, EMNLP.

[22] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[23] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[24] Claire Gardent,et al. The KBGen Challenge , 2013, ENLG.

[25] Dan Klein,et al. Learning Semantic Correspondences with Less Supervision , 2009, ACL.

[26] Ondrej Bojar,et al. Results of the WMT19 Metrics Shared Task: Segment-Level and Strong MT Systems Pose Big Challenges , 2019, WMT.

[27] Elizabeth D. Liddy,et al. Advances in Automatic Text Summarization , 2001, Information Retrieval.

[28] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29] Quoc V. Le,et al. A Neural Conversational Model , 2015, ArXiv.

[30] J. R. Landis,et al. The measurement of observer agreement for categorical data. , 1977, Biometrics.

[31] Wenhu Chen,et al. TabFact: A Large-scale Dataset for Table-based Fact Verification , 2019, ICLR.

[32] Manaal Faruqui,et al. Text Generation with Exemplar-based Adaptive Decoding , 2019, NAACL.

[33] Gaurav Pandey,et al. Exemplar Encoder-Decoder for Neural Conversation Generation , 2018, ACL.

[34] Robert Dale,et al. Building applied natural language generation systems , 1997, Natural Language Engineering.

[35] Aaron Halfaker,et al. With Few Eyes, All Hoaxes are Deep , 2018, Proc. ACM Hum. Comput. Interact..

[36] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[37] Claire Gardent,et al. The WebNLG Challenge: Generating Text from RDF Data , 2017, INLG.

[38] Zhifang Sui,et al. Table-to-text Generation by Structure-aware Seq2seq Learning , 2017, AAAI.

[39] Lydia B. Chilton,et al. TurKit: human computation algorithms on mechanical turk , 2010, UIST.

[40] Mirella Lapata,et al. Data-to-Text Generation with Content Selection and Planning , 2018, AAAI.

[41] Philipp Koehn,et al. Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[42] Ashish Agarwal,et al. Hallucinations in Neural Machine Translation , 2018 .

[43] Omer Levy,et al. Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[44] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[45] Karen Kukich,et al. Design of a Knowledge-Based Report Generator , 1983, ACL.

[46] Shashi Narayan,et al. Leveraging Pre-trained Checkpoints for Sequence Generation Tasks , 2019, Transactions of the Association for Computational Linguistics.

[47] Raymond J. Mooney,et al. Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[48] Guy Lapalme,et al. Text generation , 1990 .

[49] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[50] Ankur Parikh,et al. Handling Divergent Reference Texts when Evaluating Table-to-Text Generation , 2019, ACL.

[51] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.