论文信息 - The audio-video australian English speech data corpus AVOZES

The audio-video australian English speech data corpus AVOZES

This paper presents the Audio-Video Australian English Speech data corpus AVOZES. It contains recordings of 20 speakers uttering a variety of phrases. The corpus was designed for research on the statistical relationship of audio and video speech parameters with an audio-video (AV) automatic speech recognition (ASR) task in mind, but may be useful for other research tasks. AVOZES is the first published AV speaking-face data corpus for Australian English and is novel in its use of a stereo camera system for the video recordings and its modular design.

Roland Göcke | J. Bruce Millar | Roland Göcke | J. Millar

[1] Jiri Matas,et al. XM2VTSDB: The Extended M2VTS Database , 1999 .

[2] Jiri Matas,et al. Acquisition of a Large Database for Biometric Identity Verification , 1998 .

[3] Eric David Petajan,et al. Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .

[4] O. Faugeras,et al. On determining the fundamental matrix : analysis of different methods and experimental results , 1993 .

[5] Emanuele Trucco,et al. Introductory techniques for 3-D computer vision , 1998 .

[6] Masayuki Inaba,et al. Real-time color stereo vision system for a mobile robot based on field multiplexing , 1997, Proceedings of International Conference on Robotics and Automation.

[7] H. C. Longuet-Higgins,et al. A computer algorithm for reconstructing a scene from two projections , 1981, Nature.

[8] Jean-Philippe Thiran,et al. The BANCA Database and Evaluation Protocol , 2003, AVBPA.

[9] Olivier D. Faugeras,et al. The fundamental matrix: Theory, algorithms, and stability analysis , 2004, International Journal of Computer Vision.

[10] Thomas Ertl,et al. Computer Graphics - Principles and Practice, 3rd Edition , 2014 .

[11] M. Carter. Computer graphics: Principles and practice , 1997 .

[12] A. ZelinskyResearch. Error Analysis of Head Pose and Gaze Direction from Stereo Vision , 1999 .

[13] Alexander Zelinsky,et al. Real-time stereo tracking for head pose and gaze estimation , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[14] Gerasimos Potamianos,et al. Speaker independent audio-visual database for bimodal ASR , 1997, AVSP.

[15] J C Junqua,et al. The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[16] Javier R. Movellan,et al. Channel Separability in the Audio-Visual Integration of Speech: A Bayesian Approach , 1996 .

[17] J.B. Millar,et al. The Australian National Database of Spoken Language , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[18] R. Y. Tsai,et al. An Efficient and Accurate Camera Calibration Technique for 3D Machine Vision , 1986, CVPR 1986.

[19] J.N. Gowdy,et al. CUAVE: A new audio-visual database for multimodal human-computer interface research , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20] Gang Xu,et al. Epipolar Geometry in Stereo, Motion and Object Recognition , 1996, Computational Imaging and Vision.

[21] Sundaram Ganapathy,et al. Decomposition of transformation matrices for robot vision , 1984, Pattern Recognit. Lett..

[22] Michael Wagner,et al. Aspects of speaking-face data corpus design methodology , 2004, INTERSPEECH.

[23] Juergen Luettin,et al. Continuous Audio-Visual Speech Recognition , 1998, ECCV.

[24] Jerry D. Gibson. Principles of digital and analog communications , 1989 .

[25] Javier R. Movellan,et al. Visual Speech Recognition with Stochastic Networks , 1994, NIPS.

[26] Yoni Bauduin,et al. Audio-Visual Speech Recognition , 2004 .

[27] Alexander Zelinsky,et al. Validation of an automatic lip-tracking algorithm and design of a database for audio-video speech processing , 2000 .

[28] M. Woodward,et al. Phoneme perception in lipreading. , 1960, Journal of speech and hearing research.

[29] Eric D. Petajan. Automatic lipreading to enhance speech recognition , 1984 .