Audio-visual Evaluation and Detection of Word Prominence in a Human-Machine Interaction Scenario
暂无分享,去创建一个
[1] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[2] Martin Heckmann,et al. Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition , 2002, EURASIP J. Adv. Signal Process..
[3] Volker Strom,et al. Visual prosody: facial movements accompanying speech , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.
[4] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[5] Jeffery A. Jones,et al. Visual Prosody and Speech Intelligibility , 2004, Psychological science.
[6] Björn Granström,et al. Visual correlates to prominence in several expressive modes , 2006, INTERSPEECH.
[7] Dorothea Kolossa,et al. Audiovisual speech recognition with missing or unreliable data , 2009, AVSP.
[8] Emiel Krahmer,et al. Facial expression and prosodic prominence: Effects of modality and facial area , 2008, J. Phonetics.
[9] Paul Taylor,et al. Festival Speech Synthesis System , 1998 .
[10] Elmar Nöth,et al. VERBMOBIL: the use of prosody in the linguistic components of a speech understanding system , 2000, IEEE Trans. Speech Audio Process..
[11] Mattias Heldner,et al. On the reliability of overall intensity and spectral emphasis as acoustic correlates of focal accents in Swedish , 2003, J. Phonetics.
[12] Elizabeth Shriberg,et al. Spontaneous speech: how people really talk and why engineers should care , 2005, INTERSPEECH.
[13] Samer Al Moubayed,et al. Effects of visual prominence cues on speech intelligibility , 2009, AVSP.
[14] H. Hill,et al. Visual Correlates of Prosodic Contrastive Focus in French: Description and Inter-Speaker Variability , 2006 .
[15] Martin Heckmann,et al. Listen to the parrot: Demonstrating the quality of online pitch and formant extraction via feature-based resynthesis , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[16] Martin Heckmann,et al. Combining rate and place information for robust pitch extraction , 2007, INTERSPEECH.
[17] Andreas Stolcke,et al. Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.
[18] Petra Wagner,et al. Focus Perception and Prominence , 1998 .
[19] Julia Hirschberg,et al. Prosodic and other cues to speech recognition failures , 2004, Speech Commun..
[20] Guillaume Gibert,et al. Prosody for the eyes: quantifying visual prosody using guided principal component analysis , 2010, INTERSPEECH.
[21] Hiroshi G. Okuno,et al. Automatic speech recognition improved by two-layered audio-visual integration for robot audition , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.
[22] Chalapathy Neti,et al. Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.