Detecting Hand-Head Occlusions in Sign Language Video

A large body of current linguistic research on sign language is based on analyzing large corpora of video recordings. This requires either manual or automatic annotation of the videos. In this paper we introduce methods for automatically detecting and classifying hand-head occlusions in sign language videos. Linguistically, hand-head occlusions are an important and interesting subject of study as the head is a structural place of articulation in many signs. Our method combines easily calculable local video properties with more global hand tracking. The experiments carried out with videos of the Suvi on-line dictionary of Finnish Sign Language show that the sensitivity of the proposed local method in detecting occlusion events is 92.6%. When global hand tracking is combined in the method, the specificity can reach the level of 93.7% while still maintaining the detection sensitivity above 90%.

[1]  Markus Koskela,et al.  Method for visualisation and analysis of hand and head movements in sign language video , 2011 .

[2]  Jorma Laaksonen,et al.  Comparing computer vision analysis of signed language video with motion capture recordings , 2012, LREC.

[3]  Khalid Choukri,et al.  The european language resources association , 1998, LREC.

[4]  Ruiduo Yang,et al.  Handling Movement Epenthesis and Hand Segmentation Ambiguities in Continuous Sign Language Recognition Using Nested Dynamic Programming , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Stan Sclaroff,et al.  Sign Language Spotting with a Threshold Model Based on Conditional Random Fields , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Hermann Ney,et al.  Efficient approximations to model-based joint tracking and recognition of continuous sign language , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[7]  Tommi Jantunen,et al.  The head as a place of articulation: From automated detection to linguistic analysis , 2013 .

[8]  Václav Hlavác,et al.  Detector of Facial Landmarks Learned by the Structured Output SVM , 2012, VISAPP.

[9]  Onno Crasborn,et al.  Annotation of video data in the Corpus NGT , 2008 .

[10]  Hermann Ney,et al.  Benchmark Databases for Video-Based Automatic Sign Language Recognition , 2008, LREC.

[11]  Jorma Laaksonen,et al.  Towards Automated Visualization and Analysis of Signed Language Motion: Method and Linguistic Issues , 2010 .

[12]  Junwei Han,et al.  Automatic skin segmentation and tracking in sign language recognition , 2009 .