Unsupervised Learning of Style-sensitive Word Vectors

This paper presents the first study aimed at capturing stylistic similarity between words in an unsupervised manner. We propose extending the continuous bag of words (CBOW) model (Mikolov et al., 2013) to learn style-sensitive word vectors using a wider context window under the assumption that the style of all the words in an utterance is consistent. In addition, we introduce a novel task to predict lexical stylistic similarity and to create a benchmark dataset for this task. Our experiment with this dataset supports our assumption and demonstrates that the proposed extensions contribute to the acquisition of style-sensitive word embeddings.

[1]  Ryuichiro Higashinaka,et al.  Automatic conversion of sentence-end expressions for utterance characterization of dialogue systems , 2015, PACLIC.

[2]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[3]  Lyle H. Ungar,et al.  Discovering User Attribute Stylistic Differences via Paraphrasing , 2016, AAAI.

[4]  Felix Hill,et al.  SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity , 2016, EMNLP.

[5]  Jianfeng Gao,et al.  A Persona-Based Neural Conversation Model , 2016, ACL.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[8]  Lyle H. Ungar,et al.  Exploring Stylistic Variation with Age and Income on Twitter , 2016, ACL.

[9]  Ani Nenkova,et al.  Inducing Lexical Style Properties for Paraphrase and Genre Differentiation , 2015, NAACL.

[10]  Marine Carpuat,et al.  A Study of Style in Machine Translation: Controlling the Formality of Machine Translation Output , 2017, EMNLP.

[11]  Mihoko Teshigawara,et al.  Modern Japanese “Role Language” (Yakuwarigo): fictionalised orality in Japanese literature and popular culture , 2012 .

[12]  Joel R. Tetreault,et al.  An Empirical Analysis of Formality in Online Communication , 2016, TACL.

[13]  Rico Sennrich,et al.  Controlling Politeness in Neural Machine Translation via Side Constraints , 2016, NAACL.

[14]  Kentaro Inui,et al.  Generating Stylistically Consistent Dialog Responses with Transfer Learning , 2017, IJCNLP.

[15]  Marine Carpuat,et al.  Discovering Stylistic Variations in Distributional Vector Space Models via Lexical Paraphrases , 2017 .

[16]  Mamoru Komachi,et al.  Construction of a Japanese Word Similarity Dataset , 2017, LREC.

[17]  Marilyn A. Walker,et al.  PERSONAGE: Personality Generation for Dialogue , 2007, ACL.

[18]  Nebojsa Jojic,et al.  Steering Output Style and Topic in Neural Response Generation , 2017, EMNLP.

[19]  Wei Xu,et al.  From Shakespeare to Twitter: What are Language Styles all about? , 2017 .