Speaker Turn Detection Based on Multimodal Situation Analysis
暂无分享,去创建一个
[1] Alice Caplier,et al. Accurate and quasi-automatic lip tracking , 2004, IEEE Transactions on Circuits and Systems for Video Technology.
[2] Andrey Ronzhin,et al. From smart devices to smart space , 2010 .
[3] Hervé Bourlard,et al. Audio-visual synchronisation for speaker diarisation , 2010, INTERSPEECH.
[4] Gwenn Englebienne,et al. Multimodal Speaker Diarization , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[5] Peng Dai,et al. Audio-Visual Fused Online Context Analysis Toward Smart Meeting Room , 2007, UIC.
[6] Andreas Stolcke,et al. Multispeaker speech activity detection for the ICSI meeting recorder , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..
[7] Andrey Ronzhin,et al. Very Large Vocabulary ASR for Spoken Russian with Syntactic and Morphemic Analysis , 2011, INTERSPEECH.
[8] Mark J. F. Gales,et al. The Cambridge University March 2005 speaker diarisation system , 2005, INTERSPEECH.
[9] Chuohao Yeo,et al. Multi-modal speaker diarization of real-world meetings using compressed-domain video features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[10] Alessio Brutti,et al. Speaker Localization in CHIL Lectures: Evaluation Criteria and Results , 2005, MLMI.
[11] Andrey Ronzhin,et al. Event-Driven Content Management System for Smart Meeting Room , 2011, NEW2AN.
[12] Andrey Ronzhin,et al. Multimodal Interaction with Intelligent Meeting Room Facilities from Inside and Outside , 2009, NEW2AN.
[13] Andrey Ronzhin,et al. Speech recognition for east Slavic languages: the case of Russian , 2012, SLTU.
[14] Malcolm Slaney,et al. FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks , 2000, NIPS.
[15] Jean Carletta,et al. Nonverbal behaviours improving a simulation of small group discussion , 2003 .
[16] Yannis Stylianou,et al. Video and audio based detection of filled hesitation pauses in classroom lectures , 2009, 2009 17th European Signal Processing Conference.
[17] Javier R. Movellan,et al. Audio Vision: Using Audio-Visual Synchrony to Locate Sounds , 1999, NIPS.
[18] Alexander L. Ronzhin,et al. A Video Monitoring Model with a Distributed Camera System for the Smart Space , 2010, NEW2AN.
[19] Alexey Karpov,et al. Analysis of long-distance word dependencies and pronunciation variability at conversational Russian speech recognition , 2012, 2012 Federated Conference on Computer Science and Information Systems (FedCSIS).
[20] Hervé Bourlard,et al. Using audio and visual cues for speaker diarisation initialisation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.