DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation

There is an undeniable communication barrier between deaf people and people with normal hearing ability. Although innovations in sign language translation technology aim to tear down this communication barrier, the majority of existing sign language translation systems are either intrusive or constrained by resolution or ambient lighting conditions. Moreover, these existing systems can only perform single-sign ASL translation rather than sentence-level translation, making them much less useful in daily-life communication scenarios. In this work, we fill this critical gap by presenting DeepASL, a transformative deep learning-based sign language translation technology that enables ubiquitous and non-intrusive American Sign Language (ASL) translation at both word and sentence levels. DeepASL uses infrared light as its sensing mechanism to non-intrusively capture the ASL signs. It incorporates a novel hierarchical bidirectional deep recurrent neural network (HB-RNN) and a probabilistic framework based on Connectionist Temporal Classification (CTC) for word-level and sentence-level ASL translation respectively. To evaluate its performance, we have collected 7, 306 samples from 11 participants, covering 56 commonly used ASL words and 100 ASL sentences. DeepASL achieves an average 94.5% word-level translation accuracy and an average 8.2% word error rate on translating unseen ASL sentences. Given its promising performance, we believe DeepASL represents a significant step towards breaking the communication barrier between deaf people and hearing majority, and thus has the significant potential to fundamentally change deaf people's lives.

[1]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Ching-Hua Chuan,et al.  American Sign Language Recognition Using Leap Motion Sensor , 2014, 2014 13th International Conference on Machine Learning and Applications.

[4]  Luc Van Gool,et al.  Real-time sign language letter and word recognition from depth data , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[5]  Vasiliki Kosmidou,et al.  Sign Language Recognition Using Intrinsic-Mode Sample Entropy on sEMG and Accelerometer Data , 2009, IEEE Transactions on Biomedical Engineering.

[6]  Mi Zhang,et al.  BodyScan: Enabling Radio-based Sensing on Wearable Devices for Contactless Activity and Vital Sign Monitoring , 2016, MobiSys.

[7]  Mauro Donadeo,et al.  Combining multiple depth-based descriptors for hand gesture recognition , 2014, Pattern Recognit. Lett..

[8]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[9]  Ian McGraw,et al.  Personalized speech recognition on mobile devices , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Johannes Wagner,et al.  Bi-channel sensor fusion for automatic sign language recognition , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[11]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[14]  Scott K. Liddell Grammar, Gesture, and Meaning in American Sign Language , 2003 .

[15]  Hae Young Noh,et al.  Burnout: A Wearable System for Unobtrusive Skeletal Muscle Fatigue Estimation , 2016, 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[16]  Xiaohui Xie,et al.  Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks , 2016, AAAI.

[17]  Thad Starner,et al.  A novel approach to American Sign Language (ASL) phrase verification using reversed signing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[18]  Derek Ho,et al.  Glove-based hand gesture recognition sign language translator using capacitive touch sensor , 2016, 2016 IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC).

[19]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[20]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[21]  Nikhita Praveen,et al.  Sign language interpreter using a smart glove , 2014, 2014 International Conference on Advances in Electronics Computers and Communications.

[22]  Shimon Whiteson,et al.  LipNet: End-to-End Sentence-level Lipreading , 2016, 1611.01599.

[23]  Kongqiao Wang,et al.  Automatic recognition of sign language subwords based on portable accelerometer and EMG sensors , 2010, ICMI-MLMI '10.

[24]  Jack Hoza,et al.  It’s Not What You Sign, It’s How You Sign It: Politeness in American Sign Language , 2007 .

[25]  Karl-Friedrich Kraiss,et al.  Recent developments in visual sign language recognition , 2008, Universal Access in the Information Society.

[26]  Guang Li,et al.  VisualComm: a tool to support communication between deaf and hearing persons with the Kinect , 2013, ASSETS.

[27]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[28]  Kongqiao Wang,et al.  A Sign-Component-Based Framework for Chinese Sign Language Recognition Using Accelerometer and sEMG Data , 2012, IEEE Transactions on Biomedical Engineering.

[29]  Pietro Zanuttigh,et al.  Hand gesture recognition with leap motion and kinect devices , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[30]  Paul Lukowicz,et al.  Using multiple sensors for mobile sign language recognition , 2003, Seventh IEEE International Symposium on Wearable Computers, 2003. Proceedings..

[31]  Guang Li,et al.  Sign Language Recognition and Translation with Kinect , 2013 .

[32]  Roozbeh Jafari,et al.  Real-time American Sign Language Recognition using wrist-worn motion and surface EMG sensors , 2015, 2015 IEEE 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN).

[33]  Farnoush Banaei Kashani,et al.  A Lightweight and Inexpensive In-ear Sensing System For Automatic Whole-night Sleep Stage Monitoring , 2016, SenSys.

[34]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[35]  Qiang Li,et al.  MusicalHeart: a hearty way of listening to music , 2012, SenSys '12.

[36]  Kaishun Wu,et al.  WiFall: Device-free fall detection by wireless networks , 2017, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[37]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[38]  Khaled Assaleh,et al.  Glove-Based Continuous Arabic Sign Language Recognition in User-Dependent Mode , 2015, IEEE Transactions on Human-Machine Systems.

[39]  Shimon Whiteson,et al.  LipNet: Sentence-level Lipreading , 2016, ArXiv.

[40]  Kehkashan Kanwal,et al.  Assistive glove for Pakistani Sign Language translation , 2014, 17th IEEE International Multi Topic Conference 2014.

[41]  Parameswaran Ramanathan,et al.  Leveraging directional antenna capabilities for fine-grained gesture recognition , 2014, UbiComp.

[42]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .