Towards Attribute-Entangled Controllable Text Generation: A Pilot Study of Blessing Generation

Controllable Text Generation (CTG) has obtained great success due to its fine-grained generation ability obtained by focusing on multiple attributes. However, most existing CTG researches overlook how to utilize the attribute entanglement to enhance the diversity of the controlled generated texts . Facing this dilemma, we focus on a novel CTG scenario, i.e., blessing generation which is challenging because high-quality blessing texts require CTG models to comprehensively consider the entanglement between multiple attributes (e.g., objects and occasions). To promote the research on blessing generation, we present EBleT, a large-scale E ntangled Ble ssing T ext dataset containing 293K English sentences annotated with multiple attributes. Further-more, we propose novel evaluation metrics to measure the quality of the blessing texts generated by the baseline models we designed. Our study opens a new research di-rection for controllable text generation and en-ables the development of attribute-entangled CTG models. Our dataset and source codes are available at https://github.com/ huangshulin123/Blessing-Generation .

[1]  Haitao Zheng,et al.  Are we ready for a new paradigm shift? A survey on visual deep MLP , 2021, Patterns.

[2]  Hai-Tao Zheng,et al.  A Non-Hierarchical Attention Network with Modality Dropout for Textual Response Generation in Multimodal Dialogue Systems , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Maosong Sun,et al.  Aspect-Level Sentiment-Controllable Review Generation with Mutual Learning Framework , 2021, AAAI.

[4]  Changyou Chen,et al.  Transformer-based Conditional Variational Autoencoder for Controllable Story Generation , 2021, ArXiv.

[5]  Charles Foster,et al.  The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.

[6]  Yi Cai,et al.  Controllable Abstractive Sentence Summarization with Guiding Entities , 2020, COLING.

[7]  Anima Anandkumar,et al.  Controllable Story Generation with External Knowledge Using Large-Scale Language Models , 2020, EMNLP.

[8]  Alan W Black,et al.  Exploring Controllable Text Generation Techniques , 2020, COLING.

[9]  Diyi Yang,et al.  ToTTo: A Controlled Table-To-Text Generation Dataset , 2020, EMNLP.

[10]  Xiaoyuan Yi,et al.  MixPoet: Diverse Poetry Generation via Learning Controllable Mixed Latent Space , 2020, AAAI.

[11]  Chenliang Li,et al.  Pre-train and Plug-in: Flexible Conditional Text Generation with Variational Auto-Encoders , 2019, ACL.

[12]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[13]  Zhoujun Li,et al.  Harnessing Pre-Trained Neural Networks with Rules for Formality Style Transfer , 2019, EMNLP.

[14]  Zhimin He,et al.  A Shopping Guide Text Generation System Based on Deep Neural Network , 2019, 2019 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR).

[15]  Zhifang Sui,et al.  Learning to Control the Fine-grained Sentiment for Story Ending Generation , 2019, ACL.

[16]  Joe G. Saliby Survey on Natural Language Generation , 2019, International Journal of Trend in Scientific Research and Development.

[17]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[18]  Joel R. Tetreault,et al.  Dear Sir or Madam, May I Introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer , 2018, NAACL.

[19]  Jason Weston,et al.  Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.

[20]  Louis-Philippe Morency,et al.  Affect-LM: A Neural Language Model for Customizable Affective Text Generation , 2017, ACL.

[21]  Luke S. Zettlemoyer,et al.  A Theme-Rewriting Approach for Generating Algebra Word Problems , 2016, EMNLP.

[22]  Ashwin K. Vijayakumar,et al.  Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models , 2016, ArXiv.

[23]  Graham Neubig,et al.  Controlling Output Length in Neural Encoder-Decoders , 2016, EMNLP.

[24]  Nathanael Chambers,et al.  A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.

[25]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[26]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[27]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[28]  Heng Ji,et al.  A Novel Neural Topic Model and Its Supervised Extension , 2015, AAAI.

[29]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[30]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[31]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[32]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .