Aligning ASL for Statistical Translation Using a Discriminative Word Model

We describe a method to align ASL video subtitles with a closed-caption transcript. Our alignments are partial, based on spotting words within the video sequence, which consists of joined (rather than isolated) signs with unknown word boundaries. We start with windows known to contain an example of a word, but not limited to it. We estimate the start and end of the word in these examples using a voting method. This provides a small number of training examples (typically three per word). Since there is no shared structure, we use a discriminative rather than a generative word model. While our word spotters are not perfect, they are sufficient to establish an alignment. We demonstrate that quite small numbers of good word spotters results in an alignment good enough to produce simple English-ASL translations, both by phrase matching and using word substitution.

[1]  W. Stokoe,et al.  A dictionary of American sign language on linguistic principles , 1965 .

[2]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[3]  Robert L. Mercer,et al.  A Statistical Approach to Sense Disambiguation in Machine Translation , 1991, HLT.

[4]  Ralph Grishman,et al.  Alignment of Shared Forests for Bilingual Corpora , 1996, COLING.

[5]  Kirsti Grobel,et al.  Isolated sign language recognition using hidden Markov models , 1996, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[6]  Elliott Macklovitch,et al.  Line ‘Em Up: Advances in Alignment Technology and their Impact on Translation Support Tools , 2004, Machine Translation.

[7]  Dimitris N. Metaxas,et al.  ASL recognition based on a coupling between HMMs and 3D motion analysis , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[8]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Dimitris N. Metaxas,et al.  Parallel hidden Markov models for American sign language recognition , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[10]  David Yarowsky,et al.  Statistical Machine Translation: Final Report , 1999 .

[11]  Dimitris N. Metaxas,et al.  Toward Scalability in ASL Recognition: Breaking Down Signs into Phonemes , 1999, Gesture Workshop.

[12]  Ralph Grishman,et al.  Chart-Based Transfer Rule Application in Machine Translation , 2000, COLING.

[13]  Dimitris N. Metaxas,et al.  A framework for motion recognition with applications to American sign language and gait recognition , 2000, Proceedings Workshop on Human Motion.

[14]  Hermann Hienz,et al.  Relevant features for video-based continuous sign language recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[15]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[16]  Alexander I. Rudnicky,et al.  Speech Translation on a Tight Budget without Enough Data , 2002, Speech-to-Speech Translation@ACL.

[17]  Dan Tufis,et al.  Empirical Methods for Exploiting Parallel Texts , 2002, Lit. Linguistic Comput..

[18]  Dimitris N. Metaxas,et al.  Handshapes and Movements: Multiple-Channel American Sign Language Recognition , 2003, Gesture Workshop.

[19]  Scott K. Liddell Grammar, Gesture, and Meaning in American Sign Language , 2003 .

[20]  Matt Huenerfauth,et al.  A Survey and Critique of American Sign Language Natural Language Generation and Machine Translation Systems , 2003 .

[21]  R. Manmatha,et al.  Word image matching using dynamic time warping , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[22]  Alexander I. Rudnicky,et al.  Interactive Speech Translation in the Diplomat Project , 2000, Machine Translation.

[23]  David Windridge,et al.  A Linguistic Feature Vector for the Visual Interpretation of Sign Language , 2004, ECCV.

[24]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[25]  Hermann Ney,et al.  Statistical Sign Language Translation , 2004 .