PRAV: A Phonetically Rich Audio Visual Corpus
暂无分享,去创建一个
[1] Yoni Bauduin,et al. Audio-Visual Speech Recognition , 2004 .
[2] Jonathan G. Fiscus,et al. DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .
[3] Prasanta Kumar Ghosh,et al. A comparative study of articulatory features from facial video and acoustic-to-articulatory inversion for phonetic discrimination , 2016, 2016 International Conference on Signal Processing and Communications (SPCOM).
[4] Timothy F. Cootes,et al. Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..
[5] Carlos Busso,et al. IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.
[6] Margaret McRorie,et al. The Belfast Induced Natural Emotion Database , 2012, IEEE Transactions on Affective Computing.
[7] Timothy F. Cootes,et al. Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..
[8] D W Massaro,et al. American Psychological Association, Inc. Evaluation and Integration of Visual and Auditory Information in Speech Perception , 2022 .
[9] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[10] Stefan A. Frisch,et al. An audiovisual database of English speech sounds , 2003 .
[11] Dominique Estival,et al. Building an Audio-Visual Corpus of Australian English: Large Corpus Collection with an Economical Portable and Replicable Black Box , 2011, INTERSPEECH.
[12] A A Montgomery,et al. Auditory and visual contributions to the perception of consonants. , 1974, Journal of speech and hearing research.
[13] Masaaki Honda,et al. Speaker Adaptation Method for Acoustic-to-Articulatory Inversion using an HMM-Based Speech Production Model , 2004, IEICE Trans. Inf. Syst..
[14] Wesley Mattheyses,et al. Audiovisual speech synthesis: An overview of the state-of-the-art , 2015, Speech Commun..
[15] Yochai Konig,et al. "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[16] Chalapathy Neti,et al. Audio-visual speech recognition in challenging environments , 2003, INTERSPEECH.
[17] Cigdem Eroglu Erdem,et al. A Turkish audio-visual emotional database , 2013, 2013 21st Signal Processing and Communications Applications Conference (SIU).
[18] Joe Frankel,et al. Linear dynamic models for automatic speech recognition , 2004 .
[19] Wesley Mattheyses,et al. Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis , 2013, Speech Commun..
[20] Prasanta Kumar Ghosh,et al. Improved subject-independent acoustic-to-articulatory inversion , 2015, Speech Commun..
[21] Athanasios Katsamanis,et al. A Multimodal Real-Time MRI Articulatory Corpus for Speech Research , 2011, INTERSPEECH.
[22] H. McGurk,et al. Visual influences on speech perception processes , 1978, Perception & psychophysics.
[23] Michael Pucher,et al. Speaker-adaptive visual speech synthesis in the HMM-framework , 2012, INTERSPEECH.
[24] A. Macleod,et al. Quantifying the contribution of vision to speech perception in noise. , 1987, British journal of audiology.
[25] Conrad Sanderson,et al. The VidTIMIT Database , 2002 .
[26] George W. Quinn,et al. Distinguishing identical twins by face recognition , 2011, Face and Gesture 2011.
[27] John P. Lewis,et al. Automated eye motion using texture synthesis , 2005, IEEE Computer Graphics and Applications.
[28] Vladimir Pavlovic,et al. Boosting and structure learning in dynamic Bayesian networks for audio-visual speaker detection , 2002, Object recognition supported by user interaction for service robots.
[29] Ricardo Gutierrez-Osuna,et al. Audio/visual mapping with cross-modal hidden Markov models , 2005, IEEE Transactions on Multimedia.
[30] Matti Pietikäinen,et al. Bi-Modal Person Recognition on a Mobile Phone: Using Mobile Phone Data , 2012, 2012 IEEE International Conference on Multimedia and Expo Workshops.
[31] Paul Boersma,et al. Praat, a system for doing phonetics by computer , 2002 .
[32] Matti Pietikäinen,et al. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON MULTIMEDIA 1 Lipreading with Local Spatiotemporal Descriptors , 2022 .
[33] Jiri Matas,et al. XM2VTSDB: The Extended M2VTS Database , 1999 .
[34] Matti Pietikäinen,et al. OuluVS2: A multi-view audiovisual database for non-rigid mouth motion analysis , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).
[35] Tony Ezzat,et al. Visual Speech Synthesis by Morphing Visemes , 2000, International Journal of Computer Vision.
[36] Louis D. Braida,et al. Evaluating the articulation index for auditory-visual input. , 1987, The Journal of the Acoustical Society of America.