Visual Language Identification from Facial Landmarks

The automatic Visual Language IDentification (VLID), i.e. a problem of using visual information to identify the language being spoken, using no audio information, is studied. The proposed method employs facial landmarks automatically detected in a video. A convex optimisation problem to find jointly both the discriminative representation (a soft-histogram over a set of lip shapes) and the classifier is formulated. A 10-fold cross-validation is performed on dataset consisting of 644 videos collected from youtube.com resulting in accuracy 73% in a pairwise discrimination between English and French (50% for a chance). A study, in which 10 videos were used, suggests that the proposed method performs better than average human in discriminating between the languages.

[1]  David A. Ross,et al.  Automatic Language Identification in music videos with low level audio and visual features , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Chi-Ho Chan,et al.  Speaker authentication using video-based lip information , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Stephen J. Cox,et al.  Automatic visual-only language identification: A preliminary study , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Stephen J. Cox,et al.  Speaker independent visual-only language identification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Q. Summerfield,et al.  Lipreading and audio-visual speech perception. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[6]  Takeshi Saitoh,et al.  Lip Reading Based on Sampled Active Contour Model , 2005, ICIAR.

[7]  Stephen J. Cox,et al.  Language Identification Using Visual Features , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Suprava Patnaik,et al.  Comparison of classifiers for lip reading with CUAVE and TULIPS database , 2015 .

[9]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Jiri Matas,et al.  WaldBoost - learning for time constrained sequential detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).