Real-Time Depth-Based Hand Tracking for American Sign Language Recognition

There are estimated to be more than a million Deaf and severely hard of hearing individuals living in the United States. For many of these individuals, American Sign Language (ASL) is their primary means of communication. However, for most day-to-day interactions, native-ASL usersmust either get by with a mixture of gestures and written communication in a non-native language or seek the assistance of an interpreter. Whereas advances towards automated translation between many other languages have benefited greatly from decades of research into speech recognition and Statistical Machine Translation, ASLs lack of aural and written components have limited exploration into automated translation of ASL. In this thesis, I focus on work towards recognizing components of American Sign Language in real-time. I first evaluate the suitability of a real-time depth-based generative hand tracking model for estimating ASL handshapes. I then present a study of ASL fingerspelling recognition, in which real-time tracking and classification methods are applied to continuous sign sequences. I will then discuss the future steps needed to expand a real-time fingerspelling recognition to theproblem of general ASL recognition.

[1]  Brad A. Myers,et al.  Analyzing the input stream for character- level errors in unconstrained text entry evaluations , 2006, TCHI.

[2]  Scott K. Liddell,et al.  American Sign Language: The Phonological Base , 2013 .

[3]  Luc Van Gool,et al.  Real-time sign language letter and word recognition from depth data , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[4]  Mohamed Hisham Jaward,et al.  Robust ASL Fingerspelling Recognition Using Local Binary Patterns and Geometric Features , 2013, 2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[5]  Chen Qian,et al.  Realtime and Robust Hand Tracking from Depth , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Shu Wang,et al.  A New Framework for Sign Language Recognition based on 3D Handshape Identification and Linguistic Modeling , 2014, LREC.

[7]  Hermann Ney,et al.  RWTH-PHOENIX-Weather: A Large Vocabulary Sign Language Recognition and Translation Corpus , 2012, LREC.

[8]  Carlo Tomasi,et al.  Fingerspelling Recognition through Classification of Letter-to-Letter Transitions , 2009, ACCV.

[9]  Philippe Giguère,et al.  Sign Language Fingerspelling Classification from Depth and Color Images Using a Deep Belief Network , 2014, 2014 Canadian Conference on Computer and Robot Vision.

[10]  Thad Starner,et al.  American sign language recognition in game development for deaf children , 2006, Assets '06.

[11]  Andrew W. Fitzgibbon,et al.  Accurate, Robust, and Flexible Real-time Hand Tracking , 2015, CHI.

[12]  Thomas Hanke HamNoSys – Representing Sign Language Data in Language Resources and Language Processing Contexts , 2004 .

[13]  Ronnie B. Wilbur,et al.  Discriminant Features and Temporal Structure of Nonmanuals in American Sign Language , 2014, PloS one.

[14]  R. Elliott,et al.  Towards the Integration of Synthetic SL Animation with Avatars into Corpus Annotation Tools , 2010 .

[15]  Richard A. Tennant,et al.  The American Sign Language Handshape Dictionary , 1998 .

[16]  ByoungChul Ko,et al.  A Brief Review of Facial Emotion Recognition Based on Visual Information , 2018, Sensors.

[17]  William Robson Schwartz,et al.  Spatial Pyramid Matching for Finger Spelling Recognition in Intensity Images , 2014, CIARP.

[18]  Karen Livescu,et al.  Signer-independent fingerspelling recognition with deep neural network adaptation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Chin-Hui Lee,et al.  Sign Transition Modeling and a Scalable Solution to Continuous Sign Language Recognition for Real-World Applications , 2016, ACM Trans. Access. Comput..

[20]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[21]  Dimitris N. Metaxas,et al.  A Method for Recognition of Grammatically Significant Head Movements and Facial Expressions, Developed Through Use of a Linguistically Annotated Video Corpus 1 , 2009 .

[22]  Guillermo Cámara Chávez,et al.  Finger Spelling Recognition from RGB-D Information Using Kernel Descriptor , 2013, SIBGRAPI.

[23]  Diane Lillo-Martin,et al.  Calculating Frequency of Occurrence of ASL handshapes , 2013 .

[24]  Monica Carfagni,et al.  On the Performance of the Intel SR30 Depth Camera: Metrological and Critical Characterization , 2017, IEEE Sensors Journal.

[25]  Surendra Ranganath,et al.  Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Truong Q. Nguyen,et al.  Real-time sign language fingerspelling recognition using convolutional neural networks from depth map , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[27]  Eun-Jung Holden,et al.  Dynamic Fingerspelling Recognition using Geometric and Motion Features , 2006, 2006 International Conference on Image Processing.

[28]  Andrea Tagliasacchi,et al.  Low-Dimensionality Calibration through Local Anisotropic Scaling for Robust Hand Model Personalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Ayush S Parashar,et al.  Representation and Interpretation of Manual and Non-Manual Information for Automated American Sign Language Recognition , 2003 .

[30]  Roberta Michnick Golinkoff,et al.  The perception of handshapes in American Sign Language , 2005, Memory & cognition.

[31]  Andrea Tagliasacchi,et al.  Robust Articulated-ICP for Real-Time Hand Tracking , 2015 .

[32]  Simon Lucey,et al.  Subspace Constrained Mean-shift , 2009 .

[33]  Patricia Siple,et al.  Visual Constraints for Sign Language Communication , 2013 .

[34]  Robin R. Murphy,et al.  Hand gesture recognition with depth images: A review , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[35]  Charlotte Lee Baker-Shenk,et al.  American sign language: A teacher's resource text on curriculum, methods, and evaluation , 1980 .

[36]  D. Brentari Modality and structure in signed and spoken languages: Modality differences in sign language phonology and morphophonemics , 2002 .

[37]  Surendra Ranganath,et al.  Facial expressions in American sign language: Tracking and recognition , 2012, Pattern Recognit..

[38]  Andrew W. Fitzgibbon,et al.  Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences , 2016, ACM Trans. Graph..

[39]  Andrea Tagliasacchi,et al.  Sphere-meshes for real-time hand modeling and tracking , 2016, ACM Trans. Graph..

[40]  Roland Pfau,et al.  Nonmanuals: their grammatical and prosodic roles , 2010 .

[41]  C. B. Traxler,et al.  The Stanford Achievement Test, 9th Edition: National Norming and Performance Standards for Deaf and Hard-of-Hearing Students. , 2000, Journal of deaf studies and deaf education.

[42]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[43]  Jonathan Keane,et al.  Towards an articulatory model of handshape:What fingerspelling tells us about the phonetics and phonology of handshape in American Sign Language , 2014 .

[44]  Stephan Liwicki,et al.  Automatic recognition of fingerspelled words in British Sign Language , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[45]  Gregory Shakhnarovich,et al.  American sign language fingerspelling recognition with phonological feature-based tandem models , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[46]  Scott K. Liddell American Sign Language Syntax , 1981 .

[47]  E. Klima The signs of language , 1979 .

[48]  Lu Yang,et al.  Survey on 3D Hand Gesture Recognition , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[49]  Richard Bowden,et al.  Sign Language Recognition , 2011, Visual Analysis of Humans.

[50]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[51]  Yasue Mitsukura,et al.  Classification of hand postures based on 3D vision model for human-robot interaction , 2010, 19th International Symposium in Robot and Human Interactive Communication.

[52]  Hermann Ney,et al.  Using viseme recognition to improve a sign language translation system , 2013, IWSLT.

[53]  Carol Padden,et al.  How the Alphabet Came to Be Used in a Sign Language , 2003 .

[54]  Dimitris N. Metaxas,et al.  Toward Scalability in ASL Recognition: Breaking Down Signs into Phonemes , 1999, Gesture Workshop.

[55]  A. Senghas,et al.  Children Creating Language: How Nicaraguan Sign Language Acquired a Spatial Grammar , 2001, Psychological science.

[56]  Mary Elizabeth Bonham English to ASL Gloss Machine Translation , 2015 .

[57]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Riccardo Leonardi,et al.  XKin: an open source framework for hand pose and gesture recognition using kinect , 2014, The Visual Computer.

[59]  Thomas Hanke,et al.  Designing a Lexical Database for a Combined Use of Corpus Annotation and Dictionary Editing , 2016 .

[60]  Rajesh B. Mapari,et al.  Real time human pose recognition using leap motion sensor , 2015, 2015 IEEE International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN).

[61]  Mircea Nicolescu,et al.  Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[62]  Lyle Campbell,et al.  Ethnologue: Languages of the world (review) , 2008 .

[63]  Jovan Popović,et al.  Real-time hand-tracking with a color glove , 2009, SIGGRAPH 2009.

[64]  Wen Gao,et al.  Large-Vocabulary Continuous Sign Language Recognition Based on Transition-Movement Models , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[65]  Avinash C. Kak,et al.  Purdue RVL-SLLL American Sign Language Database , 2006 .

[66]  Ian Marshall,et al.  The development of language processing support for the ViSiCAST project , 2000, Assets '00.

[67]  Nicolas Pugeault,et al.  Spelling it out: Real-time ASL fingerspelling recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[68]  Lale Akarun,et al.  Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.

[69]  Timothy F. O'Connor,et al.  The Language of Glove: Wireless gesture decoder with low-power and stretchable hybrid electronics , 2017, PloS one.

[70]  W. Stokoe,et al.  Sign language structure: an outline of the visual communication systems of the American deaf. 1960. , 1961, Journal of deaf studies and deaf education.

[71]  Hermann Ney,et al.  Extensions of the Sign Language Recognition and Translation Corpus RWTH-PHOENIX-Weather , 2014, LREC.

[72]  Hermann Ney,et al.  Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data is Continuous and Weakly Labelled , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Stan Sclaroff,et al.  Exploiting phonological constraints for handshape inference in ASL video , 2011, CVPR 2011.

[74]  Gregory Shakhnarovich,et al.  Fingerspelling Recognition with Semi-Markov Conditional Random Fields , 2013, 2013 IEEE International Conference on Computer Vision.

[75]  Stan Sclaroff,et al.  Challenges in development of the American Sign Language Lexicon Video Dataset (ASLLVD) corpus , 2012 .

[76]  Dimitris N. Metaxas,et al.  Adapting hidden Markov models for ASL recognition by using three-dimensional computer vision methods , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[77]  Stefanos Zafeiriou,et al.  A survey on mouth modeling and analysis for Sign Language recognition , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[78]  Andrew Zisserman,et al.  Large-scale Learning of Sign Language by Watching TV (Using Co-occurrences) , 2013, BMVC.

[79]  Bodo Rosenhahn,et al.  Real-Time Sign Language Recognition Using a Consumer Depth Camera , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[80]  Howard Rosenbaum,et al.  Effects of reading proficiency on embedded stem priming in primary school children , 2021 .

[81]  Sudeep Sarkar,et al.  Progress in Automated Computer Recognition of Sign Language , 2004, ICCHP.

[82]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Wen Gao,et al.  Transition movement models for large vocabulary continuous sign language recognition , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[84]  Fei Yang,et al.  Recognition of Nonmanual Markers in American Sign Language (ASL) Using Non-Parametric Adaptive 2D-3D Face Tracking , 2012, LREC.