A Dataset for Linguistic Understanding, Visual Evaluation, and Recognition of Sign Languages: The K-RSL

The paper presents the first dataset that aims to serve interdisciplinary purposes for the utility of computer vision community and sign language linguistics. To date, a majority of Sign Language Recognition (SLR) approaches focus on recognising sign language as a manual gesture recognition problem. However, signers use other articulators: facial expressions, head and body position and movement to convey linguistic information. Given the important role of non-manual markers, this paper proposes a dataset and presents a use case to stress the importance of including non-manual features to improve the recognition accuracy of signs. To the best of our knowledge no prior publicly available dataset exists that explicitly focuses on non-manual components responsible for the grammar of sign languages. To this end, the proposed dataset contains 28250 videos of signs of high resolution and quality, with annotation of manual and non-manual components. We conducted a series of evaluations in order to investigate whether non-manual components would improve signs’ recognition accuracy. We release the dataset to encourage SLR researchers and help advance current progress in this area toward real-time sign language interpretation. Our dataset will be made publicly available at https://krslproject.github.io/krsl-corpus

[1]  Petros Daras,et al.  A Comprehensive Study on Deep Learning-Based 3D Hand Pose Estimation Methods , 2020, Applied Sciences.

[2]  Fabio Viola,et al.  The Kinetics Human Action Video Dataset , 2017, ArXiv.

[3]  Richard Bowden,et al.  Sign Language Recognition , 2011, Visual Analysis of Humans.

[4]  Oscar Koller,et al.  Quantitative Survey of the State of the Art in Sign Language Recognition , 2020, ArXiv.

[5]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Anara Sandygulova,et al.  Eyebrow position in grammatical and emotional expressions in Kazakh-Russian Sign Language: A quantitative study , 2020, PloS one.

[7]  Houqiang Li,et al.  Continuous Sign Language Recognition via Reinforcement Learning , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[8]  Yann LeCun,et al.  A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Sarajane Marques Peres,et al.  Grammatical facial expression recognition in sign language discourse: a study at the syntax level , 2017, Inf. Syst. Frontiers.

[10]  Wendy Sandler,et al.  Sign Language and Linguistic Universals: Entering the lexicon: lexicalization, backformation, and cross-modal borrowing , 2006 .

[11]  Seong-Whan Lee,et al.  Robust sign language recognition by combining manual and non-manual features based on conditional random field and support vector machine , 2013, Pattern Recognit. Lett..

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  Moritz Knorr,et al.  The significance of facial features for automatic sign language recognition , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[14]  Sang-Ki Ko,et al.  Neural Sign Language Translation based on Human Keypoint Estimation , 2018, Applied Sciences.

[15]  Meredith Ringel Morris,et al.  Sign Language Recognition, Generation, and Translation: An Interdisciplinary Perspective , 2019, ASSETS.

[16]  Jie Huang,et al.  Video-based Sign Language Recognition without Temporal Segmentation , 2018, AAAI.

[17]  Houqiang Li,et al.  Dynamic Pseudo Label Decoding for Continuous Sign Language Recognition , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[18]  U. Zeshan Hand, head and face - negative constructions in sign languages , 2004 .

[19]  Changshui Zhang,et al.  A Deep Neural Framework for Continuous Sign Language Recognition by Iterative Training , 2019, IEEE Transactions on Multimedia.

[20]  Fei Yang,et al.  Non-manual grammatical marker recognition based on multi-scale, spatio-temporal analysis of head pose and facial expressions , 2014, Image Vis. Comput..

[21]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Anara Sandygulova,et al.  Evaluation of Manual and Non-manual Components for Sign Language Recognition , 2020, LREC.

[23]  O. Crasborn,et al.  Mixed Signals: Combining Linguistic and Affective Functions of Eyebrows in Questions in Sign Language of the Netherlands , 2009, Language and speech.

[24]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[25]  Ulrike Zeshan,et al.  Interrogative Constructions in Signed Languages: Crosslinguistic Perspectives , 2004 .

[26]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[27]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Hermann Ney,et al.  Neural Sign Language Translation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Jorma Laaksonen,et al.  S-pot - a benchmark in spotting signs within continuous signing , 2014, LREC.

[30]  Avinash C. Kak,et al.  Proceedings of IEEE International Conference on Multimodel Interfaces, 2002 , 2022 .

[31]  Stavroula-Evita Fotinea,et al.  GSLC: Creation and Annotation of a Greek Sign Language Corpus for HCI , 2007, HCI.

[32]  Hermann Ney,et al.  Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Sunil Kumar,et al.  Extraction of texture and geometrical features from informative facial regions for sign language recognition , 2017, Journal on Multimodal User Interfaces.

[34]  Heng Wang,et al.  Large-Scale Weakly-Supervised Pre-Training for Video Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yaser Sheikh,et al.  Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Stefanos Zafeiriou,et al.  A survey on mouth modeling and analysis for Sign Language recognition , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[37]  Stan Sclaroff,et al.  The American Sign Language Lexicon Video Dataset , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[38]  B. Woll,et al.  Frequency distribution and spreading behavior of different types of mouth actions in three sign languages , 2008 .