论文信息 - Automatic Segmentation of Sign Language into Subtitle-Units

Automatic Segmentation of Sign Language into Subtitle-Units

We present baseline results for a new task of automatic segmentation of Sign Language video into sentence-like units. We use a corpus of natural Sign Language video with accurately aligned subtitles to train a spatio-temporal graph convolutional network with a BiLSTM on 2D skeleton data to automatically detect the temporal boundaries of subtitles. In doing so, we segment Sign Language video into subtitle-units that can be translated into phrases in a written language. We achieve a ROC-AUC statistic of 0.87 at the frame level and 92% label accuracy within a time margin of 0.6s of the true labels.

[1] Michèle Gouiffès,et al. MEDIAPI-SKEL - A 2D-Skeleton Video Database of French Sign Language With Aligned French Subtitles , 2020, LREC.

[2] Meredith Ringel Morris,et al. Sign Language Recognition, Generation, and Translation: An Interdisciplinary Perspective , 2019, ASSETS.

[3] Sang-Ki Ko,et al. Neural Sign Language Translation based on Human Keypoint Estimation , 2018, Applied Sciences.

[4] Hermann Ney,et al. From Feedforward to Recurrent LSTM Neural Networks for Language Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5] Olga Veksler,et al. Star Shape Prior for Graph-Cut Image Segmentation , 2008, ECCV.

[6] Robert de Beaugrande,et al. Sentence first, verdict afterwards: On the remarkable career of the “sentence” , 1999 .

[7] Michèle Gouiffès,et al. Dicta-Sign-LSF-v2: Remake of a Continuous French Sign Language Dialogue Corpus and a First Baseline for Automatic Sign Language Processing , 2020, LREC.

[8] Hermann Ney,et al. Re-Sign: Re-Aligned End-to-End Sequence Modelling with Deep Recurrent CNN-HMMs , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Tony F. Chan,et al. Level set based shape prior segmentation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10] Johanna Mesch,et al. Segmenting the Swedish Sign Language corpus : On the possibilities of using visual cues as a basis for syntactic segmentation , 2014, LREC 2014.

[11] Dahua Lin,et al. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[12] Jie Huang,et al. Video-based Sign Language Recognition without Temporal Segmentation , 2018, AAAI.

[13] Onno Crasborn. How to recognise a sentence when you see one , 2007 .

[14] Jordan Fenlon,et al. Seeing sentence boundaries , 2007 .

[15] Lori Lamel,et al. Development and Evaluation of Automatic Punctuation for French and English Speech-to-Text , 2012, INTERSPEECH.

[16] H. Ney,et al. Towards Automatic Sign Language Annotation for the ELAN Tool , 2008 .