HMM-based Approaches to Model Multichannel Information in Sign Language Inspired from Articulatory Features-based Speech Processing

Sign language conveys information through multiple channels, such as hand shape, hand movement, and mouthing. Modeling this multichannel information is a highly challenging problem. In this paper, we elucidate the link between spoken language and sign language in terms of production phenomenon and perception phenomenon. Through this link we show that hidden Markov model-based approaches developed to model "articulatory" features for spoken language processing can be exploited to model the multichannel information inherent in sign language for sign language processing.

[1]  Simon King,et al.  Articulatory Feature-Based Methods for Acoustic and Audio-Visual Speech Recognition: Summary from the 2006 JHU Summer workshop , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[3]  Oscar Koller,et al.  SubUNets: End-to-End Hand Shape and Continuous Sign Language Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Hermann Ney,et al.  Deep Sign: Hybrid CNN-HMM for Continuous Sign Language Recognition , 2016, BMVC.

[5]  Richard Bowden,et al.  Large Lexicon Detection of Sign Language , 2007, ICCV-HCI.

[6]  Marzieh Razavi,et al.  Data-Driven Movement Subunit Extraction from Skeleton Information for Modeling Signs and Gestures , 2019 .

[7]  Sarah Ebling,et al.  SMILE Swiss German Sign Language Dataset , 2018, LREC.

[8]  Ramya Rasipuram,et al.  Articulatory feature based continuous speech recognition using probabilistic lexical modeling , 2016, Comput. Speech Lang..

[9]  Tadashi Kitamura,et al.  Subunit Modeling for Japanese Sign Language Recognition Based on Phonetically Depend Multi-stream Hidden Markov Models , 2013, HCI.

[10]  Hermann Ney,et al.  Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data is Continuous and Weakly Labelled , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  W. Stokoe Sign language structure: an outline of the visual communication systems of the American deaf. 1960. , 1961, Journal of deaf studies and deaf education.

[12]  Scott K. Liddell,et al.  American Sign Language: The Phonological Base , 2013 .

[13]  Jithendra Vepa,et al.  An Acoustic Model Based on Kullback-Leibler Divergence for Posterior Features , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[14]  Roberto Cipolla,et al.  Real-time Interpretation of Hand Motions using a Sparse Bayesian Classifier on Motion Gradient Orientation Images , 2005, BMVC.

[15]  Dimitris N. Metaxas,et al.  Parallel hidden Markov models for American sign language recognition , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[16]  Hervé Bourlard,et al.  Using KL-based acoustic models in a large vocabulary recognition task , 2008, INTERSPEECH.

[17]  Simon King,et al.  Articulatory feature classifiers trained on 2000 hours of telephone speech , 2007, INTERSPEECH.

[18]  Dimitris N. Metaxas,et al.  Adapting hidden Markov models for ASL recognition by using three-dimensional computer vision methods , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[19]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Oya Aran,et al.  VISION BASED SIGN LANGUAGE RECOGNITION: MODELING AND RECOGNIZING ISOLATED SIGNS WITH MANUAL AND NON-MANUAL COMPONENTS , 2008 .

[21]  David Windridge,et al.  A Linguistic Feature Vector for the Visual Interpretation of Sign Language , 2004, ECCV.

[22]  Simon King,et al.  Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPS , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[23]  Daniel P. W. Ellis,et al.  Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[24]  Simon King,et al.  Speech production knowledge in automatic speech recognition. , 2007, The Journal of the Acoustical Society of America.

[25]  Marcel J. T. Reinders,et al.  Learning to recognize a sign from a single example , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[26]  Andrew Zisserman,et al.  Minimal Training, Large Lexicon, Unconstrained Sign Language Recognition , 2004, BMVC.

[27]  Petros Maragos,et al.  Advances in phonetics-based sub-unit modeling for transcription alignment and sign language recognition , 2011, CVPR 2011 WORKSHOPS.