The audio-video australian English speech data corpus AVOZES

This paper presents the Audio-Video Australian English Speech data corpus AVOZES. It contains recordings of 20 speakers uttering a variety of phrases. The corpus was designed for research on the statistical relationship of audio and video speech parameters with an audio-video (AV) automatic speech recognition (ASR) task in mind, but may be useful for other research tasks. AVOZES is the first published AV speaking-face data corpus for Australian English and is novel in its use of a stereo camera system for the video recordings and its modular design.

[1]  Jiri Matas,et al.  XM2VTSDB: The Extended M2VTS Database , 1999 .

[2]  Jiri Matas,et al.  Acquisition of a Large Database for Biometric Identity Verification , 1998 .

[3]  Eric David Petajan,et al.  Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .

[4]  O. Faugeras,et al.  On determining the fundamental matrix : analysis of different methods and experimental results , 1993 .

[5]  Emanuele Trucco,et al.  Introductory techniques for 3-D computer vision , 1998 .

[6]  Masayuki Inaba,et al.  Real-time color stereo vision system for a mobile robot based on field multiplexing , 1997, Proceedings of International Conference on Robotics and Automation.

[7]  H. C. Longuet-Higgins,et al.  A computer algorithm for reconstructing a scene from two projections , 1981, Nature.

[8]  Jean-Philippe Thiran,et al.  The BANCA Database and Evaluation Protocol , 2003, AVBPA.

[9]  Olivier D. Faugeras,et al.  The fundamental matrix: Theory, algorithms, and stability analysis , 2004, International Journal of Computer Vision.

[10]  Thomas Ertl,et al.  Computer Graphics - Principles and Practice, 3rd Edition , 2014 .

[11]  M. Carter Computer graphics: Principles and practice , 1997 .

[12]  A. ZelinskyResearch Error Analysis of Head Pose and Gaze Direction from Stereo Vision , 1999 .

[13]  Alexander Zelinsky,et al.  Real-time stereo tracking for head pose and gaze estimation , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[14]  Gerasimos Potamianos,et al.  Speaker independent audio-visual database for bimodal ASR , 1997, AVSP.

[15]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[16]  Javier R. Movellan,et al.  Channel Separability in the Audio-Visual Integration of Speech: A Bayesian Approach , 1996 .

[17]  J.B. Millar,et al.  The Australian National Database of Spoken Language , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  R. Y. Tsai,et al.  An Efficient and Accurate Camera Calibration Technique for 3D Machine Vision , 1986, CVPR 1986.

[19]  J.N. Gowdy,et al.  CUAVE: A new audio-visual database for multimodal human-computer interface research , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Gang Xu,et al.  Epipolar Geometry in Stereo, Motion and Object Recognition , 1996, Computational Imaging and Vision.

[21]  Sundaram Ganapathy,et al.  Decomposition of transformation matrices for robot vision , 1984, Pattern Recognit. Lett..

[22]  Michael Wagner,et al.  Aspects of speaking-face data corpus design methodology , 2004, INTERSPEECH.

[23]  Juergen Luettin,et al.  Continuous Audio-Visual Speech Recognition , 1998, ECCV.

[24]  Jerry D. Gibson Principles of digital and analog communications , 1989 .

[25]  Javier R. Movellan,et al.  Visual Speech Recognition with Stochastic Networks , 1994, NIPS.

[26]  Yoni Bauduin,et al.  Audio-Visual Speech Recognition , 2004 .

[27]  Alexander Zelinsky,et al.  Validation of an automatic lip-tracking algorithm and design of a database for audio-video speech processing , 2000 .

[28]  M. Woodward,et al.  Phoneme perception in lipreading. , 1960, Journal of speech and hearing research.

[29]  Eric D. Petajan Automatic lipreading to enhance speech recognition , 1984 .