BBC-Oxford British Sign Language Dataset

In this work, we introduce the BBC-Oxford British Sign Language (BOBSL) dataset, a large-scale video collection of British Sign Language (BSL). BOBSL is an extended and publicly released dataset based on the BSL-1K dataset [1] introduced in previous work. We describe the motivation for the dataset, together with statistics and available annotations. We conduct experiments to provide baselines for the tasks of sign recognition, sign language alignment, and sign language translation. Finally, we describe several strengths and limitations of the data from the perspectives of machine learning and linguistics, note sources of bias present in the dataset, and discuss potential applications of BOBSL in the context of sign language technology. The dataset is available at https://www.robots.ox.ac.uk/∼vgg/data/bobsl/.

[1]  Trevor Johnston,et al.  From archive to corpus: transcription and annotation in the creation of signed language corpora , 2008, PACLIC.

[2]  Andrew Zisserman,et al.  Large-scale Learning of Sign Language by Watching TV (Using Co-occurrences) , 2013, BMVC.

[3]  Oscar Koller,et al.  Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Andrew Zisserman,et al.  Read and Attend: Temporal Localisation in Sign Language Videos , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[6]  Jorma Laaksonen,et al.  S-pot - a benchmark in spotting signs within continuous signing , 2014, LREC.

[7]  Andrew Zisserman,et al.  Learning sign language by watching TV (using weakly aligned subtitles) , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Bencie Woll,et al.  The Linguistics of British Sign Language: An Introduction , 1999 .

[9]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Xavier Giro-i-Nieto,et al.  How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Wengang Zhou,et al.  Improving Sign Language Translation with Monolingual Data by Sign Back-Translation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Triantafyllos Afouras,et al.  Aligning Subtitles in Sign Language Videos , 2021, ArXiv.

[14]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[15]  Malihe Alikhani,et al.  Including Signed Languages in Natural Language Processing , 2021, ACL.

[16]  Petros Daras,et al.  A Comprehensive Study on Sign Language Recognition Methods , 2020, ArXiv.

[17]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Stan Sclaroff,et al.  The American Sign Language Lexicon Video Dataset , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[19]  Jordan Fenlon,et al.  Lexical Variation and Change in British Sign Language , 2014, PloS one.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Mitesh M. Khapra,et al.  INCLUDE: A Large Scale Dataset for Indian Sign Language Recognition , 2020, ACM Multimedia.

[22]  Abhishek Dutta,et al.  The VIA Annotation Software for Images, Audio and Video , 2019, ACM Multimedia.

[23]  Tao Jiang,et al.  Looking for the Signs: Identifying Isolated Sign Instances in Continuous Video Footage , 2021, 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021).

[24]  Samuel Albanie,et al.  Sign Segmentation with Changepoint-Modulated Pseudo-Labelling , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  Meredith Ringel Morris,et al.  Sign Language Recognition, Generation, and Translation: An Interdisciplinary Perspective , 2019, ASSETS.

[26]  Andrew Zisserman,et al.  Seeing wake words: Audio-visual Keyword Spotting , 2020, BMVC.

[27]  Jia Deng,et al.  RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[28]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Joon Son Chung,et al.  Lip Reading in the Wild , 2016, ACCV.

[30]  Giacomo Inches,et al.  Content4All Open Research Sign Language Translation Datasets , 2021, 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021).

[31]  Themos Stafylakis,et al.  Zero-shot keyword spotting for visual speech recognition in-the-wild , 2018, ECCV.

[32]  R. Sutton-Spence Mouthings and Simultaneity in British Sign Language , 2007 .

[33]  Stefanos Zafeiriou,et al.  RetinaFace: Single-stage Dense Face Localisation in the Wild , 2019, ArXiv.

[34]  Matt Huenerfauth,et al.  Accessibility for Deaf and Hard of Hearing Users: Sign Language Conversational User Interfaces , 2020, CIU.

[35]  Moritz Knorr,et al.  The significance of facial features for automatic sign language recognition , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[36]  Avinash C. Kak,et al.  Purdue RVL-SLLL American Sign Language Database , 2006 .

[37]  Bencie Woll,et al.  The sign that dares to speak its name: echo phonology in British Sign Language , 2001 .

[38]  Sang-Ki Ko,et al.  Neural Sign Language Translation based on Human Keypoint Estimation , 2018, Applied Sciences.

[39]  Joon Son Chung,et al.  Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.

[40]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[41]  Ahmet Alp Kindiroglu,et al.  BosphorusSign22k Sign Language Recognition Dataset , 2020, SIGNLANG.

[42]  Jordan Fenlon,et al.  Building the British Sign Language Corpus , 2013 .

[43]  L. R. Rabiner,et al.  A comparative study of several dynamic time-warping algorithms for connected-word recognition , 1981, The Bell System Technical Journal.

[44]  Hermann Ney,et al.  Neural Sign Language Translation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Sarah Ebling,et al.  SMILE Swiss German Sign Language Dataset , 2018, LREC.

[46]  Tao Jiang,et al.  Skeletor: Skeletal Transformers for Robust Body-Pose Estimation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[47]  Joon Son Chung,et al.  BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues , 2020, European Conference on Computer Vision.

[48]  Hermann Ney,et al.  Efficient approximations to model-based joint tracking and recognition of continuous sign language , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[49]  Jie Huang,et al.  Video-based Sign Language Recognition without Temporal Segmentation , 2018, AAAI.

[50]  Matt Huenerfauth,et al.  Effect of Automatic Sign Recognition Performance on the Usability of Video-Based Search Interfaces for Sign Language Dictionaries , 2019, ASSETS.

[51]  Hermann Ney,et al.  Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers , 2015, Comput. Vis. Image Underst..

[52]  Vinay P. Namboodiri,et al.  Towards Automatic Speech to Sign Language Generation , 2021, Interspeech.

[53]  Hacer Yalim Keles,et al.  AUTSL: A Large Scale Multi-Modal Turkish Sign Language Dataset and Baseline Methods , 2020, IEEE Access.

[54]  Xin Yu,et al.  Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[55]  Naomi K. Caselli,et al.  The ASL-LEX 2.0 Project: A Database of Lexical and Phonological Properties for 2,723 Signs in American Sign Language , 2021, Journal of deaf studies and deaf education.

[56]  Gül Varol,et al.  Sign Language Segmentation with Temporal Convolutional Networks , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[57]  Oscar Koller,et al.  MS-ASL: A Large-Scale Data Set and Benchmark for Understanding American Sign Language , 2018, BMVC.

[58]  Joon Son Chung,et al.  Signs in time: Encoding human motion as a temporal image , 2016, ArXiv.

[59]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Houqiang Li,et al.  Attention-Based 3D-CNNs for Large-Vocabulary Sign Language Recognition , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[61]  Andrew Zisserman,et al.  Watch, read and lookup: learning to spot signs from multiple supervisors , 2020, ACCV.

[62]  Paula Buttery,et al.  A Text Normalisation System for Non-Standard English Words , 2017, NUT@EMNLP.