Improving Sign Recognition with Phonology

We use insights from research on American Sign Language (ASL) phonology to train models for isolated sign language recognition (ISLR), a step towards automatic sign language understanding. Our key insight is to explicitly recognize the role of phonology in sign production to achieve more accurate ISLR than existing work which does not consider sign language phonology. We train ISLR models that take in pose estimations of a signer producing a single sign to predict not only the sign but additionally its phonological characteristics, such as the handshape. These auxiliary predictions lead to a nearly 9% absolute gain in sign recognition accuracy on the WLASL benchmark, with consistent improvements in ISLR regardless of the underlying prediction model architecture. This work has the potential to accelerate linguistic research in the domain of signed languages and reduce communication barriers between deaf and hearing people.

[1]  A. Cangelosi,et al.  WLASL-LEX: a Dataset for Recognising Phonological Properties in American Sign Language , 2022, Annual Meeting of the Association for Computational Linguistics.

[2]  Mitesh M. Khapra,et al.  OpenHands: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages , 2021, ACL.

[3]  Wengang Zhou,et al.  Hand-Model-Aware Sign Language Recognition , 2021, AAAI.

[4]  Malihe Alikhani,et al.  Including Signed Languages in Natural Language Processing , 2021, ACL.

[5]  Yun Fu,et al.  Skeleton Aware Multi-modal Sign Language Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6]  Naomi K. Caselli,et al.  The ASL-LEX 2.0 Project: A Database of Lexical and Phonological Properties for 2,723 Signs in American Sign Language , 2021, Journal of deaf studies and deaf education.

[7]  Parth H. Pathak,et al.  Hand Pose Guided 3D Pooling for Word-level Sign Language Recognition , 2021, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[8]  Joon Son Chung,et al.  BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues , 2020, ECCV.

[9]  Hongdong Li,et al.  Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[10]  Meredith Ringel Morris,et al.  Sign Language Recognition, Generation, and Translation: An Interdisciplinary Perspective , 2019, ASSETS.

[11]  Petros Daras,et al.  SIGN LANGUAGE RECOGNITION BASED ON HAND AND BODY SKELETAL DATA , 2018, 2018 - 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON).

[12]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Sarah L. Nesbeitt Ethnologue: Languages of the World , 1999 .

[14]  Diane Brentari,et al.  A Prosodic Model of Sign Language Phonology , 1999 .

[15]  Yifan Zhang,et al.  Decoupling GCN with DropGraph Module for Skeleton-Based Action Recognition , 2020, ECCV.

[16]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17]  Ronnie B. Wilbur,et al.  Phonological parameters in Croatian Sign Language , 2006 .

[18]  E. Klima The signs of language , 1979 .