Benchmark Databases for Video-Based Automatic Sign Language Recognition

A new, linguistically annotated, video database for automatic sign language recognition is presented. The new RWTH-BOSTON-400 corpus, which consists of 843 sentences, several speakers and separate subsets for training, development, and testing is described in detail. For evaluation and benchmarking of automatic sign language recognition, large corpora are needed. Recent research has focused mainly on isolated sign language recognition methods using video sequences that have been recorded under lab conditions using special hardware like data gloves. Such databases have often consisted generally of only one speaker and thus have been speaker-dependent, and have had only small vocabularies. A new database access interface, which was designed and created to provide fast access to the database statistics and content, makes it possible to easily browse and retrieve particular subsets of the video database. Preliminary baseline results on the new corpora are presented. In contradistinction to other research in this area, all databases presented in this paper will be publicly available.

[1]  Carol Neidle,et al.  SignStream™: A database tool for research on visual-gestural language , 2002 .

[2]  Hermann Ney,et al.  Speech recognition techniques for a sign language recognition system , 2007, INTERSPEECH.

[3]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Dimitris N. Metaxas,et al.  Learning-based dynamic coupling of discrete and continuous trackers , 2006, Comput. Vis. Image Underst..

[5]  Dietrich Klakow,et al.  Testing the correlation of word error rate and perplexity , 2002, Speech Commun..

[6]  Stan Sclaroff,et al.  Automatic 2D Hand Tracking in Video Sequences , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[7]  Avinash C. Kak,et al.  Proceedings of IEEE International Conference on Multimodel Interfaces, 2002 , 2022 .

[8]  Hermann Ney,et al.  Tracking using dynamic programming for appearance-based sign language recognition , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[9]  Siome Goldenstein,et al.  Facial movement analysis in ASL , 2007, Universal Access in the Information Society.

[10]  W. Stokoe,et al.  A dictionary of American sign language on linguistic principles , 1965 .

[11]  Ruiduo Yang,et al.  Enhanced Level Building Algorithm for the Movement Epenthesis Problem in Sign Language Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Hermann Ney,et al.  Pronunciation Clustering and Modeling of Variability for Appearance-Based Sign Language Recognition , 2005, Gesture Workshop.

[13]  Dimitris N. Metaxas,et al.  Handshapes and movements: Multiple-channel ASL recognition , 2004 .

[14]  Hermann Ney,et al.  Enhancing a Sign Language Translation System with Vision-Based Features , 2009, Gesture Workshop.

[15]  Dimitris N. Metaxas,et al.  A Framework for Recognizing the Simultaneous Aspects of American Sign Language , 2001, Comput. Vis. Image Underst..

[16]  Clayton Valli,et al.  The Gallaudet Dictionary of American Sign Language , 2021 .

[17]  Philippe Dreuw Continuous Sign Language Recognition Approaches from Speech Recognition , 2006 .

[18]  David Windridge,et al.  A Linguistic Feature Vector for the Visual Interpretation of Sign Language , 2004, ECCV.

[19]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[20]  C Neidle,et al.  SignStream: A tool for linguistic and computer vision research on visual-gestural language data , 2001, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[21]  Stan Sclaroff,et al.  Automatic detection of relevant head gestures in American Sign Language communication , 2002, Object recognition supported by user interaction for service robots.