StyleKQC: A Style-Variant Paraphrase Corpus for Korean Questions and Commands

Paraphrasing is often performed with less concern for controlled style conversion. Especially for questions and commands, stylevariant paraphrasing can be crucial in tone and manner, which also matters with industrial applications such as dialog system. In this paper, we attack this issue with a corpus construction scheme that simultaneously considers the core content and style of directives, namely intent and formality, for the Korean language. Utilizing manually generated natural language queries on six daily topics, we expand the corpus to formal and informal sentences by human rewriting and transferring. We verify the validity and industrial applicability of our approach by checking the adequate classification and inference performance that fit with the fine-tuning approaches, at the same time proposing a supervised formality transfer task.

[1]  Harsh Jhamtani,et al.  Shakespearizing Modern Language Using Copy-Enriched Sequence to Sequence Models , 2017, Proceedings of the Workshop on Stylistic Variation.

[2]  Lei Li,et al.  Generating Sentences from Disentangled Syntactic and Semantic Spaces , 2019, ACL.

[3]  Alexey Tikhonov,et al.  Style-transfer and Paraphrase: Looking for a Sensible Semantic Similarity Metric , 2021, AAAI.

[4]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[5]  Samy Bengio,et al.  Content preserving text generation with attribute controls , 2018, NeurIPS.

[6]  Yueguo Gu Politeness phenomena in modern Chinese , 1990 .

[7]  Won-Ik Cho,et al.  Discourse Component to Sentence (DC2S): An Efficient Human-Aided Construction of Paraphrase and Sentence Similarity Dataset , 2020, LREC.

[8]  Fei Li,et al.  Generating Classical Chinese Poems from Vernacular Chinese , 2019, EMNLP.

[9]  Hong Joo Lee,et al.  Positioning of Smart Speakers by Applying Text Mining to Consumer Reviews: Focusing on Artificial Intelligence Factors , 2020 .

[10]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[11]  Zhou Yu,et al.  Structured Content Preservation for Unsupervised Text Style Transfer , 2018, ArXiv.

[12]  Dongyan Zhao,et al.  Style Transfer in Text: Exploration and Evaluation , 2017, AAAI.

[13]  Cícero Nogueira dos Santos,et al.  Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer , 2018, ACL.

[14]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[15]  Penelope Brown,et al.  Politeness: Some Universals in Language Usage , 1989 .

[16]  J. Eun,et al.  Indexicality and honorific speech level choice in Korean , 2005 .

[17]  Andrew Sangpil Byon The role of linguistic indirectness and honorifics in achieving linguistic politeness in Korean requests , 2006 .

[18]  Richard Yuanzhe Pang The Daunting Task of Real-World Textual Style Transfer Auto-Evaluation , 2019, ArXiv.

[19]  S. Okamoto SITUATED POLITENESS: MANIPULATING HONORIFIC AND NON-HONORIFIC EXPRESSIONS IN JAPANESE CONVERSATIONS , 1999 .

[20]  M. González Politeness: some universals in language usage , 1995 .

[21]  A. Fukada,et al.  Universal politeness theory: application to the use of Japanese honorifics , 2004 .

[22]  Joel R. Tetreault,et al.  Dear Sir or Madam, May I Introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer , 2018, NAACL.