Russian Pragmatic Markers Database: Developing Speech Technologies for Everyday Spoken Discourse

The paper presents recent results obtained within the ongoing project dedicated to the study of Russian pragmatic markers. Pragmatic markers are obligatory elements of natural speech in any language; moreover, they are considered to be functionally important for speech production and overcoming inevitable speech difficulties. A correct understanding of use and functions of pragmatic markers is a prerequisite for solution of many applied tasks related to speech technologies. The research is carried out on the data of two speech corpora — ORD corpus of Russian Everyday Speech known as “One Day of Speech” corpus and SAT corpus “Balanced Annotated Collection of Texts”, which consists primarily of monologues. The article describes the database of Russian pragmatic markers designed to support both linguistic and pragmatic studies of spoken Russian and the development of speech technologies for everyday discourse. Besides, it presents actual statistical data on pragmatic markers distribution in natural speech depending on different factors.

[1]  J. Evers-Vermeul,et al.  Grammaticalization or pragmaticalization of discourse markers? More than a terminological issue , 2012 .

[2]  G. Diewald Pragmaticalization (defined) as grammaticalization of discourse functions , 2011 .

[3]  N. Himmelmann,et al.  Grammaticalization v. pragmaticaliuation? The development of pragmatic markers in German and Italian , 2004 .

[4]  Tatiana Y. Sherstinova,et al.  Pragmatic Markers of Russian Everyday Speech: the Revised Typology and Corpus-Based Study , 2019, 2019 25th Conference of Open Innovations Association (FRUCT).

[5]  Natalia Bogdanova-Beglarian,et al.  Sociolinguistic Extension of the ORD Corpus of Russian Everyday Speech , 2016, SPECOM.

[6]  K. Aijmer Pragmatic Markers in Spoken Interlanguage , 2004 .

[7]  Laurel J. Brinton,et al.  Pragmatic Markers in English: Grammaticalization and Discourse Functions , 1996 .

[8]  Tatiana Y. Sherstinova The Structure of the ORD Speech Corpus of Russian Everyday Communication , 2009, TSD.

[9]  Costas Gabrielatos,et al.  A corpus-based study of pragmatic markers in London English , 2011 .

[10]  Anatoli Gorlatov Interjektionen im Russischen als interaktive Einheiten , 2014 .

[11]  Natalia Bogdanova-Beglarian,et al.  Pragmatic Markers Distribution in Russian Everyday Speech: Frequency Lists and Other Statistics for Discourse Modeling , 2019, SPECOM.

[12]  Svetlana Stepanova,et al.  The ORD Speech Corpus of Russian Everyday Communication "One Speaker's Day": Creation Principles and Annotation , 2009, TSD.