PARAD-R: Speech analysis software for meeting support

The main goal of the development of systems of formal logging activities is to automate the whole process of transcription of the participant speech. In this paper we outline modern methods of audio and video signal processing and personification data analysis for multimodal speaker diarization. The proposed PARAD-R software for Russian speech analysis implemented for audio speaker diarization and will be enhanced based on advances of multimodal situation analysis in a meeting room.

[1]  Malcolm Slaney,et al.  FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks , 2000, NIPS.

[2]  Jean Carletta,et al.  Nonverbal behaviours improving a simulation of small group discussion , 2003 .

[3]  Alice Caplier,et al.  Accurate and quasi-automatic lip tracking , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Andrey Ronzhin,et al.  From smart devices to smart space , 2010 .

[5]  Hervé Bourlard,et al.  Audio-visual synchronisation for speaker diarisation , 2010, INTERSPEECH.

[6]  Alexey Karpov,et al.  Lexicon Size and Language Model Order Optimization for Russian LVCSR , 2013, SPECOM.

[7]  Yannis Stylianou,et al.  Video and audio based detection of filled hesitation pauses in classroom lectures , 2009, 2009 17th European Signal Processing Conference.

[8]  Alexey Karpov,et al.  Analysis of long-distance word dependencies and pronunciation variability at conversational Russian speech recognition , 2012, 2012 Federated Conference on Computer Science and Information Systems (FedCSIS).

[9]  Andrey Ronzhin,et al.  Event-Driven Content Management System for Smart Meeting Room , 2011, NEW2AN.

[10]  Peng Dai,et al.  Audio-Visual Fused Online Context Analysis Toward Smart Meeting Room , 2007, UIC.

[11]  Mark J. F. Gales,et al.  The Cambridge University March 2005 speaker diarisation system , 2005, INTERSPEECH.

[12]  Andrey Ronzhin,et al.  Very Large Vocabulary ASR for Spoken Russian with Syntactic and Morphemic Analysis , 2011, INTERSPEECH.

[13]  Chuohao Yeo,et al.  Multi-modal speaker diarization of real-world meetings using compressed-domain video features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Javier R. Movellan,et al.  Audio Vision: Using Audio-Visual Synchrony to Locate Sounds , 1999, NIPS.

[15]  Andrey Ronzhin,et al.  Speech recognition for east Slavic languages: the case of Russian , 2012, SLTU.

[16]  Andrey Ronzhin,et al.  Multimodal Interaction with Intelligent Meeting Room Facilities from Inside and Outside , 2009, NEW2AN.

[17]  Ing Yann Soon,et al.  An adaptive soft voice activity detector for automatic speech recognition system , 2011, 2011 8th International Conference on Information, Communications & Signal Processing.

[18]  Andrey Ronzhin,et al.  Speaker Turn Detection Based on Multimodal Situation Analysis , 2013, SPECOM.

[19]  Alexey Karpov,et al.  Modeling of Pronunciation, Language and Nonverbal Units at Conversational Russian Speech Recognition , 2013, Int. J. Comput. Sci. Appl..

[20]  Alexander L. Ronzhin,et al.  A Video Monitoring Model with a Distributed Camera System for the Smart Space , 2010, NEW2AN.

[21]  Gwenn Englebienne,et al.  Multimodal Speaker Diarization , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.