Text, Speech, and Dialogue

The paper describes corpora of the Russian language and the state of the art of Russian corpus linguistics. The main attention is paid to the Russian National Corpus and to specialized corpora.

[1]  Yasuharu Shimeki,et al.  Postprocessing for Character Recognition Using Keyword Information , 1992, MVA.

[2]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[3]  Lucie Skorkovská Application of Lemmatization and Summarization Methods in Topic Identification Module for Large Scale Language Modeling Data Filtering , 2012, TSD.

[4]  Geoffrey Zweig,et al.  The IBM Mandarin Broadcast Speech Transcription System , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Jan Silovský,et al.  Voice Technology to Enable Sophisticated Access to Historical Audio Archive of the Czech Radio , 2011, MM4CH.

[6]  Isabelle Guyon,et al.  Design of a linguistic postprocessor using variable memory length Markov models , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[7]  Robert Goldberg,et al.  Genetic algorithms for optical character recognition , 2008 .

[8]  Jakub Kanis,et al.  Comparison of Different Lemmatization Approaches through the Means of Information Retrieval Performance , 2010, TSD.

[9]  Ray Smith Limits on the Application of Frequency-Based Language Models to OCR , 2011, 2011 International Conference on Document Analysis and Recognition.

[10]  Mark J. F. Gales,et al.  Recent improvements to IBM's speech recognition system for automatic transcription of broadcast news , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[11]  Zhi-Hua Zhou,et al.  A k-nearest neighbor based algorithm for multi-label classification , 2005, 2005 IEEE International Conference on Granular Computing.

[12]  Zbynek Zajíc,et al.  An expert system in speaker verification task , 2008, INTERSPEECH.

[13]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[14]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[15]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[16]  Jan Svec,et al.  System for fast lexical and phonetic spoken term detection in a Czech cultural heritage archive , 2011, EURASIP J. Audio Speech Music. Process..

[17]  Ludek Müller,et al.  Benefit of Proper Language Processing for Czech Speech Retrieval in the CL-SR Task at CLEF 2006 , 2006, CLEF.

[18]  Aladdin M. Ariyaeeinia,et al.  Score normalisation applied to open-set, text-independent speaker identification , 2003, INTERSPEECH.

[19]  Shingo Kuroiwa,et al.  Category Classification and Topic Discovery of Japanese and English News Articles , 2006, MFCSIT.