Using Data-Driven Approach for Modeling Timing Parameters of American Sign Language

While many organizations provide a website in multiple languages, few provide a sign-language version for deaf users, many of whom have lower written-language literacy. Rather than providing difficult-to-update videos of humans, a more practical solution would be for the organization to specify a script (representing the sequence of words) to generate a sign-language animation. The challenge is we must select the accurate speed and timing of signs. In this work, focused on American Sign Language (ASL), motion-capture data recorded from humans is used to train machine learning models to calculate realistic timing for ASL animation movement, with an initial focus on inserting prosodic breaks (pauses), adjusting the pause durations for these pauses, and adjusting differentials signing rate for ASL animations based on the sentence syntax and other features. The methodology includes processing and cleaning data from an ASL corpus with motion-capture recordings, selecting features, and building machine learning models to predict where to insert pauses, length of pauses, and signing speed. The resulting models were evaluated using a cross-validation approach to train and test multiple models on various partitions of the dataset, to compare various learning algorithms and subsets of features. In addition, a user-based evaluation was conducted in which native ASL signers evaluated animations generated based on these models. This paper summarizes the motivations for this work, proposed solution, and the potential contribution of this work. This paper describes both completed work and some additional future research plans.