Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks

Linguistic style conveys the social context in which communication occurs and defines particular ways of using language to engage with the audiences to which the text is accessible. In this work, we are interested in the task of stylistic transfer in natural language generation (NLG) systems, which could have applications in the dissemination of knowledge across styles, automatic summarization and author obfuscation. The main challenges in this task involve the lack of parallel training data and the difficulty in using stylistic features to control generation. To address these challenges, we plan to investigate neural network approaches to NLG to automatically learn and incorporate stylistic features in the process of language generation. We identify several evaluation criteria, and propose manual and automatic evaluation approaches.

[1]  Satoshi Sekine,et al.  The Domain Dependence of Parsing , 1997, ANLP.

[2]  Tong Wang,et al.  Automatic Acquisition of Lexical Formality , 2010, COLING.

[3]  Benno Stein,et al.  Overview of the PAN/CLEF 2015 Evaluation Lab , 2015, CLEF.

[4]  Daniel Gildea,et al.  Corpus Variation and Parser Performance , 2001, EMNLP.

[5]  Bruno A. Olshausen,et al.  Discovering Hidden Factors of Variation in Deep Networks , 2014, ICLR.

[6]  Ralph Grishman,et al.  Paraphrasing for Style , 2012, COLING.

[7]  Michael Gamon,et al.  Obfuscating Document Stylometry to Preserve Author Anonymity , 2006, ACL.

[8]  Joel R. Tetreault,et al.  An Empirical Analysis of Formality in Online Communication , 2016, TACL.

[9]  Carole E. Chaski,et al.  Empirical evaluations of language-based author identification techniques , 2001 .

[10]  Rico Sennrich,et al.  Controlling Politeness in Neural Machine Translation via Side Constraints , 2016, NAACL.

[11]  Lilo Moessner,et al.  Genre, Text Type, Style, Register: A Terminological Maze? , 2001 .

[12]  Scott E. Reed,et al.  Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis , 2015, NIPS.

[13]  Patrick Juola,et al.  Analyzing Stylometric Approaches to Author Obfuscation , 2011, IFIP Int. Conf. Digital Forensics.

[14]  Maya R. Gupta,et al.  Part-of-speech histograms for genre classification of text , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Efstathios Stamatatos,et al.  N-Gram Feature Selection for Authorship Identification , 2006, AIMSA.

[16]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[17]  David Vandyke,et al.  Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[18]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[19]  Bonnie L. Webber,et al.  Squibs: Stable Classification of Text Genres , 2011, CL.

[20]  Katja Markert,et al.  The Web Library of Babel: evaluating genre collections , 2010, LREC.

[21]  David Y. W. Lee,et al.  Genres, Registers, Text Types, Domains and Styles: Clarifying the Concepts and Navigating a Path through the BNC Jungle , 2001 .

[22]  Adwait Ratnaparkhi,et al.  Learning to Parse Natural Language with Maximum Entropy Models , 1999, Machine Learning.

[23]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.