Dynamic Time Warping Based Sign Retrieval

Sign language, a visual language, is the main mode of communication for the Deaf. A large body of audio-visual material are annotated with sign language for the utilization of the Deaf. Enabling the sign based retrieval of these sources will serve two purposes: (i) Deaf people will be able to make searches using sign language; (ii) People who do not know the meaning of a sign will be able to search that sign in its source, and infer the meaning from accompanying subtitles and speech. In this work, we have developed a baseline system based on dynamic time warping for the sign sample based retrieval in videos. We analyze the effects of using different features, distance measures and temporal reduction techniques on the query performance.

[1]  Benjamin Schrauwen,et al.  Sign Language Recognition Using Convolutional Neural Networks , 2014, ECCV Workshops.

[2]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[3]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Yueting Zhuang,et al.  Content-based video similarity model , 2000, ACM Multimedia.

[5]  Murat Saraclar,et al.  Query-by-sign system for Turkish sign language broadcasts , 2018, 2018 26th Signal Processing and Communications Applications Conference (SIU).

[6]  Lale Akarun,et al.  HOSPISIGN: AN INTERACTIVE SIGN LANGUAGE PLATFORM FOR HEARING IMPAIRED , 2015 .

[7]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  A. Corradini,et al.  Dynamic time warping for off-line recognition of a small gesture vocabulary , 2001, Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems.

[10]  David H. Douglas,et al.  ALGORITHMS FOR THE REDUCTION OF THE NUMBER OF POINTS REQUIRED TO REPRESENT A DIGITIZED LINE OR ITS CARICATURE , 1973 .

[11]  Lale Akarun,et al.  DTW Based Clustering to Improve Hand Gesture Recognition , 2011, HBU.

[12]  Lale Akarun,et al.  Isolated sign language recognition with fast hand descriptors , 2018, 2018 26th Signal Processing and Communications Applications Conference (SIU).

[13]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[14]  Thad Starner,et al.  American sign language recognition with the kinect , 2011, ICMI '11.

[15]  Cedric Nishan Canagarajah,et al.  A Unified Framework for Object Retrieval and Mining , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Oscar Koller,et al.  SubUNets: End-to-End Hand Shape and Continuous Sign Language Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.