Fingerspelling Recognition with Support Vector Machines and Hidden Conditional Random Fields - A Comparison with Neural Networks and Hidden Markov Models

In this paper, we describe our experiments with Hidden Conditional Random Fields and Support Vector Machines in the problem of fingerspelling recognition of the Brazilian Sign Language (LIBRAS). We also provide a comparison against more common approaches based on Artificial Neural Networks and Hidden Markov Models, reporting statistically significant results in k-fold cross-validation. We also explore specific behaviors of the Gaussian kernel affecting performance and sparseness. To perform multi-class classification with SVMs, we use large-margin Directed Acyclic Graphs, achieving faster evaluation rates. Both ANNs and HCRFs have been trained using the Resilient Backpropagation algorithm. In this work, we validate our results using Cohen’s Kappa tests for contingency tables.

[1]  Stan Sclaroff,et al.  Sign Language Spotting with a Threshold Model Based on Conditional Random Fields , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Lingkang Huang,et al.  Variable selection in multi-class support vector machine and applications in genomic data analysis , 2008 .

[3]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4]  S. Mitra,et al.  Gesture Recognition: A Survey , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[5]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[6]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[7]  Jiří Matas,et al.  Computer Vision - ECCV 2004 , 2004, Lecture Notes in Computer Science.

[8]  Giorgio Valentini,et al.  Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods , 2004, J. Mach. Learn. Res..

[9]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[10]  Helton Hideraldo Bíscaro,et al.  Hand movement recognition for Brazilian Sign Language: A study using distance-based neural networks , 2009, 2009 International Joint Conference on Neural Networks.

[11]  Christian Igel,et al.  Improving the Rprop Learning Algorithm , 2000 .

[12]  L. R. Rabiner,et al.  Speech Recognition, Automatic: History , 2006 .

[13]  Karl-Friedrich Kraiss,et al.  Video-based sign recognition using self-organizing subunits , 2002, Object recognition supported by user interaction for service robots.

[14]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[15]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[16]  Ayoub Al-Hamadi,et al.  Discriminative Models-Based Hand Gesture Recognition , 2009, 2009 Second International Conference on Machine Vision.

[17]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[19]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[20]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.

[21]  Mauro dos Santos Anjo,et al.  Automatic recognition of finger spelling for LIBRAS based on a two-layer architecture , 2010, SAC '10.

[22]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[23]  David Windridge,et al.  A Linguistic Feature Vector for the Visual Interpretation of Sign Language , 2004, ECCV.

[24]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[26]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[28]  Alex Waibel,et al.  Readings in speech recognition , 1990 .