Text Implicates Prosodic Ambiguity: A Corpus for Intention Identification of the Korean Spoken Language

For a large portion of real-life utterances, the intention cannot be solely decided by either their semantics or syntax. Although all the socio-linguistic and pragmatic information cannot be digitized, at least phonetic features are indispensable in understanding the spoken language. Especially in head-final languages such as Korean, sentence-final intonation has great importance in identifying the speaker's intention. This paper suggests a system which identifies the intention of an utterance, given its acoustic feature and text. The proposed multi-stage classification system decides whether given utterance is a fragment, statement, question, command, or a rhetorical one, utilizing the intonation-dependency coming from head-finality. Based on an intuitive understanding of Korean language which is engaged in data annotation, we construct a network identifying the intention of a speech and validate its utility with sample sentences. The system, if combined with the speech recognizers, is expected to be flexibly inserted into various language understanding modules.

[1]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[2]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[3]  Ivan Marsic,et al.  Speech Intention Classification with Multimodal Deep Learning , 2017, Canadian Conference on AI.

[4]  Soroush Vosoughi,et al.  Tweet Acts: A Speech Act Classifier for Twitter , 2016, ICWSM.

[5]  S. Levinson Presumptive Meanings: The theory of generalized conversational implicature , 2001 .

[6]  Arun Narayanan,et al.  From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[7]  Paul Portner,et al.  The Semantics of Imperatives within a Theory of Clause Types , 2004 .

[8]  Nam Soo Kim,et al.  Real-time Automatic Word Segmentation for User-generated Text , 2018, ArXiv.

[9]  Xiaoyu Shen,et al.  DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset , 2017, IJCNLP.

[10]  S. Jun,et al.  K-Tobi (Korean ToBI) Labelling Conventions , 2000 .

[11]  Chung-hye Han,et al.  The Structure and Interpretation of Imperatives: Mood and Force in Universal Grammar , 2000 .

[12]  Magdalena Kaufmann,et al.  Fine-tuning natural language imperatives , 2019, J. Log. Comput..

[13]  Dan Jurafsky,et al.  Dialog Act Modeling for Conversational Speech , 1998 .

[14]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[15]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[16]  M. Kim Evidentiality in achieving entitlement, objectivity, and detachment in Korean conversation , 2005 .

[18]  Nam Soo Kim,et al.  Acoustic Modeling Using Adversarially Trained Variational Recurrent Neural Network for Speech Synthesis , 2018, INTERSPEECH.

[19]  Miok Pak,et al.  Types of Clauses and Sentence end Particles in Korean , 2008 .

[20]  Jason Merchant,et al.  Fragments and ellipsis , 2005 .

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[23]  Bing Liu,et al.  Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.

[24]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[25]  H. Rohde Rhetorical questions as redundant interrogatives , 2006 .

[26]  J. Searle A classification of illocutionary acts , 1976, Language in Society.

[27]  Ivan Marsic,et al.  Hybrid Attention based Multimodal Network for Spoken Language Classification , 2018, COLING.

[28]  Manfred Pinkal,et al.  Situation entity types: automatic classification of clause-level aspect , 2016, ACL.

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[31]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[32]  Kôiti Hasida,et al.  Towards an ISO Standard for Dialogue Act Annotation , 2010, LREC.

[33]  Kazutaka Shimada,et al.  A case study of comparison of several methods for corpus-based speech intention identification , 2007 .