论文信息 - Model Criticism for Long-Form Text Generation - 字舞流文

Model Criticism for Long-Form Text Generation

Language models have demonstrated the ability to generate highly ﬂuent text; however, it remains unclear whether their output retains coherent high-level structure (e.g., story pro-gression). Here, we propose to apply a statistical tool, model criticism in latent space , to evaluate the high-level structure of the generated text. Model criticism compares the distributions between real and generated data in a latent space obtained according to an assumptive generative process. Different generative processes identify speciﬁc failure modes of the underlying model. We perform experiments on three representative aspects of high-level discourse—coherence, coreference, and topicality—and ﬁnd that transformer-based language models are able to capture topical structures but have a harder time maintaining structural coherence or modeling coreference. Toledo Township is a township in County, west are Westmoreland rivers. and township Tuskegee River ...

Alexander M. Rush | Volodymyr Kuleshov | Yuntian Deng

[1] J. Dean,et al. Emergent Abilities of Large Language Models , 2022, ArXiv.

[2] Gerard de Melo,et al. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , 2022, ArXiv.

[3] Mohit Iyyer,et al. RankGen: Improving Text Generation with Large Ranking Models , 2022, ArXiv.

[4] Tal Linzen,et al. When a sentence does not introduce a discourse entity, Transformer-based models still sometimes refer to it , 2022, NAACL.

[5] Mohit Iyyer,et al. ChapterBreak: A Challenge Dataset for Long-Range Language Models , 2022, NAACL.

[6] Wilker Aziz,et al. Statistical Model Criticism of Variational Auto-Encoders , 2022, ArXiv.

[7] Tomás Kociský,et al. Towards Coherent and Consistent Use of Entities in Narrative Generation , 2022, ICML.

[8] Reza Yazdani Aminabadi,et al. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model , 2022, ArXiv.

[9] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, ArXiv.

[10] DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence , 2022, ArXiv.

[11] Noah A. Smith,et al. Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text , 2021, Annual Meeting of the Association for Computational Linguistics.

[12] David Bieber,et al. Show Your Work: Scratchpads for Intermediate Computation with Language Models , 2021, ArXiv.

[13] Mohit Iyyer,et al. Do Long-Range Language Models Actually Use Long-Range Context? , 2021, EMNLP.

[14] Weizhe Yuan,et al. BARTScore: Evaluating Generated Text as Text Generation , 2021, NeurIPS.

[15] Artidoro Pagnoni,et al. Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics , 2021, NAACL.

[16] Stella Biderman,et al. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow , 2021 .

[17] Yejin Choi,et al. MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers , 2021, NeurIPS.

[18] Charles Foster,et al. The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.

[19] Elizabeth Clark,et al. Evaluation of Text Generation: A Survey , 2020, ArXiv.

[20] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[21] Wilker Aziz,et al. Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation , 2020, COLING.

[22] Alexander M. Rush. Torch-Struct: Deep Structured Prediction Library , 2020, ACL.

[23] Graham Neubig,et al. Understanding Knowledge Distillation in Non-autoregressive Machine Translation , 2019, ICLR.

[24] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[25] Jason Weston,et al. Neural Text Generation with Unlikelihood Training , 2019, ICLR.

[26] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.

[27] Kilian Q. Weinberger,et al. BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[28] M. Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.

[29] Fei Liu,et al. MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance , 2019, EMNLP.

[30] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[31] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[32] D. Joel,et al. The Future of Sex and Gender in Psychology: Five Challenges to the Gender Binary , 2019, The American psychologist.

[33] Christopher K. I. Williams,et al. Model Criticism in Latent Space , 2017, Bayesian Analysis.

[34] Aki Vehtari,et al. Visualization in Bayesian workflow , 2017, Journal of the Royal Statistical Society: Series A (Statistics in Society).

[35] Christian Hardmeier,et al. Entity Decisions in Neural Language Modelling: Approaches and Problems , 2019 .

[36] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[37] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[38] Joel R. Tetreault,et al. Discourse Coherence in the Wild: A Dataset, Evaluation and Methods , 2018, SIGDIAL Conference.

[39] Franck Dernoncourt,et al. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents , 2018, NAACL.

[40] Victor O. K. Li,et al. Non-Autoregressive Neural Machine Translation , 2017, ICLR.

[41] Alexander M. Rush,et al. Adversarially Regularized Autoencoders , 2017, ICML.

[42] Honglak Lee,et al. Sentence Ordering and Coherence Modeling using Recurrent Neural Networks , 2016, AAAI.

[43] Brian Larson,et al. Gender as a Variable in Natural-Language Processing: Ethical Considerations , 2017, EthNLP@EACL.

[44] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45] Kevin Crowston,et al. Amazon Mechanical Turk: A Research Tool for Organizations and Information Systems Scholars , 2012, Shaping the Future of ICT Research.

[46] David Bamman,et al. Gender identity and lexical variation in social media , 2012, 1210.4567.

[47] David M. Blei,et al. Bayesian Checking for Topic Models , 2011, EMNLP.

[48] Vincent Ng,et al. Modeling Organization in Student Essays , 2010, EMNLP.

[49] John K Kruschke,et al. Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[50] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[51] Filippo Menczer,et al. Modeling Statistical Properties of Written Text , 2009, PloS one.

[52] Heng Tao Shen,et al. Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[53] Mirella Lapata,et al. Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[54] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[55] John D. Lafferty,et al. Correlated Topic Models , 2005, NIPS.

[56] H. Stern,et al. Bayesian Model Checking and Model Diagnostics , 2005 .

[57] Barbara Di Eugenio,et al. Centering: A Parametric Theory and Its Instantiations , 2004, Computational Linguistics.

[58] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[59] Steven Bird,et al. NLTK: The Natural Language Toolkit , 2002, ACL.

[60] Dipak K. Dey,et al. Simulation Based Model Checking for Hierarchical Models , 2004 .

[61] Stuart M. Shieber,et al. Comma Restoration Using Constituency Information , 2003, HLT-NAACL.

[62] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[63] D. Buring,et al. BINDING THEORY , 2003 .

[64] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[65] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[66] James H. Martin,et al. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[67] Alvin F. Martin,et al. The NIST 1999 Speaker Recognition Evaluation - An Overview , 2000, Digit. Signal Process..

[68] Dipak K. Dey,et al. A simulation-intensive approach for checking hierarchical models , 1998 .

[69] Randall Hendrick,et al. The Representation and Processing of Coreference in Discourse , 1998, Cogn. Sci..

[70] Scott Weinstein,et al. Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[71] Robert E. Weiss,et al. Residuals and Outliers in Repeated Measures Random Effects Models , 1995 .

[72] G. V. Puskorius,et al. Truncated backpropagation through time and Kalman filter training for neurocontrol , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[73] Hermann Ney,et al. On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[74] Noam Chomsky. Lectures on Government and Binding: The Pisa Lectures , 1993 .

[75] K. Chaloner,et al. A Bayesian approach to outlier detection and residual analysis , 1988 .

[76] George E. P. Box,et al. Sampling and Bayes' inference in scientific modelling and robustness , 1980 .

[77] Lauri Karttunen,et al. Discourse Referents , 1969, COLING.

[78] H. Hotelling. A Generalized T Test and Measure of Multivariate Dispersion , 1951 .