Machine learning for gesture recognition from videos

[1]  Peter Wittenburg,et al.  Improving Native Language Identification with TF-IDF Weighting , 2013, BEA@NAACL-HLT.

[2]  Przemyslaw Lenkiewicz,et al.  Application of video processing methods for linguistic research , 2011, LTC 2011.

[3]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[4]  Gerald Friedland,et al.  An adaptive initialization method for speaker Diarization based on prosodic features , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[6]  Bin Ma,et al.  A Vector Space Modeling Approach to Spoken Language Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  P. KaewTrakulPong,et al.  An Improved Adaptive Background Mixture Model for Real-time Tracking with Shadow Detection , 2002 .

[8]  Ted E. Dunning,et al.  Statistical Identification of Language , 1994 .

[9]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[10]  Sileye O. Ba,et al.  Speech/Non-Speech Detection in Meetings from Automatically Extracted low Resolution Visual Features , 2010, ICASSP.

[11]  Jonas Beskow,et al.  Visual Recognition of Isolated Swedish Sign Language Signs , 2012, ArXiv.

[12]  Hervé Bourlard,et al.  Unknown-multiple speaker clustering using HMM , 2002, INTERSPEECH.

[13]  Douglas A. Reynolds,et al.  Speaker diarisation for broadcast news , 2004, Odyssey.

[14]  Peter Wittenburg,et al.  Motion history images for online speaker/signer diarization , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Willem J. M. Levelt,et al.  Pointing and voicing in deictic expressions , 1985 .

[16]  Marcos Zampieri,et al.  Automatic identification of language varieties: The case of Portuguese , 2012, KONVENS.

[17]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[18]  Satoshi Nakamura,et al.  Never-ending learning system for on-line speaker diarization , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[19]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[20]  José Manuel Pardo,et al.  Robust Speaker Diarization for meetings , 2006 .

[21]  Edward H. Adelson,et al.  PYRAMID METHODS IN IMAGE PROCESSING. , 1984 .

[22]  Mickael Rouvier,et al.  An open-source state-of-the-art toolbox for broadcast news diarization , 2013, INTERSPEECH.

[23]  J. Coates,et al.  Turn‐taking patterns in deaf conversation , 2001 .

[24]  E. Zwicker,et al.  Subdivision of the audible frequency range into critical bands , 1961 .

[25]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[26]  Gerald Friedland,et al.  Live speaker identification in conversations , 2008, ACM Multimedia.

[27]  Rachel I. Mayberry,et al.  Language and Gesture: Gesture production during stuttered speech: insights into the nature of gesture–speech integration , 2000 .

[28]  W. Stokoe,et al.  Sign language structure: an outline of the visual communication systems of the American deaf. 1960. , 1961, Journal of deaf studies and deaf education.

[29]  Xavier Anguera Miró,et al.  Friends and enemies: a novel initialization for speaker diarization , 2006, INTERSPEECH.

[30]  Ying Wu,et al.  Vision-Based Gesture Recognition: A Review , 1999, Gesture Workshop.

[31]  Slim Essid,et al.  A Multimodal Approach to Speaker Diarization on TV Talk-Shows , 2013, IEEE Transactions on Multimedia.

[32]  E. Klima The signs of language , 1979 .

[33]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Peter Wittenburg,et al.  Automatic sign language identification , 2013, 2013 IEEE International Conference on Image Processing.

[35]  Vladimir Vezhnevets,et al.  A Survey on Pixel-Based Skin Color Detection Techniques , 2003 .

[36]  Jitendra Ajmera,et al.  A robust speaker clustering algorithm , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[37]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[38]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Vittorio Murino,et al.  Look at Who's Talking: Voice Activity Detection by Automated Gesture Analysis , 2011, AmI Workshops.

[40]  Hugo Guterman,et al.  Initialization of Iterative-Based Speaker Diarization Systems for Telephone Conversations , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  A. Kendon Some Relationships Between Body Motion and Speech , 1972 .

[42]  Timothy Baldwin,et al.  Language Identification: The Long and the Short of the Matter , 2010, NAACL.

[43]  Isabel Trancoso,et al.  Accent identification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[44]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[45]  Binyam Gebrekidan Gebre,et al.  Classifying pluricentric languages: Extending the monolingual model , 2012 .

[46]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[47]  J. Lannoy,et al.  Gestures and Speech: Psychological Investigations , 1991 .

[48]  Geoffrey Zweig,et al.  An empirical study of automatic accent classification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[49]  S. Shattuck-Hufnagel A reply to McNeill. , 1982 .

[50]  David A. van Leeuwen,et al.  Speaker Diarization Error Analysis Using Oracle Components , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[51]  Marcos Zampieri,et al.  N-gram Language Models and POS Distribution for the Identification of Spanish Varieties (Ngrammes et Traits Morphosyntaxiques pour la Identification de Variétés de l’Espagnol) [in French] , 2013, JEP/TALN/RECITAL.

[52]  Marijn Huijbregts,et al.  The ICSI RT07s Speaker Diarization System , 2007, CLEAR.

[53]  Sarah Florence Taub,et al.  Language from the Body: Iconicity and Metaphor in American Sign Language , 2001 .

[54]  P. Peer,et al.  Human skin color clustering for face detection , 2003, The IEEE Region 8 EUROCON 2003. Computer as a Tool..

[55]  Scott K. Liddell,et al.  American Sign Language: The Phonological Base , 2013 .

[56]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[57]  D. McNeill Gesture and Thought , 2005 .

[58]  Hervé Bourlard,et al.  Using audio and visual cues for speaker diarisation initialisation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[59]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[60]  Olivier Cappé,et al.  Soft nonnegative matrix co-factorizationwith application to multimodal speaker diarization , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[61]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[62]  P. Kay,et al.  Universals and cultural variation in turn-taking in conversation , 2009, Proceedings of the National Academy of Sciences.

[63]  Trevor Darrell,et al.  Integrated Person Tracking Using Stereo, Color, and Pattern Detection , 2000, International Journal of Computer Vision.

[64]  Marijn Huijbregts,et al.  Segmentation, diarization and speech transcription : surprise data unraveled , 2008 .

[65]  Jonathan G. Fiscus,et al.  The Rich Transcription 2007 Meeting Recognition Evaluation , 2007, CLEAR.

[66]  Andrew Y. Ng,et al.  Learning Feature Representations with K-Means , 2012, Neural Networks: Tricks of the Trade.

[67]  Aaron E. Rosenberg,et al.  Unsupervised speaker segmentation of telephone conversations , 2002, INTERSPEECH.

[68]  Md. Atiqur Rahman Ahad,et al.  Motion History Images for Action Recognition and Understanding , 2012, SpringerBriefs in Computer Science.

[69]  Chuohao Yeo,et al.  Multi-modal speaker diarization of real-world meetings using compressed-domain video features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[70]  Peter Wittenburg,et al.  Unsupervised Feature Learning for Visual Sign Language Identification , 2014, ACL.

[71]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[72]  S Goldin-Meadow,et al.  What's communication got to do with it? Gesture in children blind from birth. , 1997, Developmental psychology.

[73]  Peter Wittenburg,et al.  The gesturer is the speaker , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[74]  Susan Goldin-Meadow,et al.  The Relation Between Gesture and Speech in Congenitally Blind and Sighted Language-Learners , 2000 .

[75]  Lirong Dai,et al.  Deep Bottleneck Features for Spoken Language Identification , 2014, PloS one.

[76]  Sudeep Sarkar,et al.  Exploring Co-Occurence Between Speech and Body Movement for Audio-Guided Video Localization , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[77]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[78]  J. Hawkins,et al.  On Intelligence , 2004 .

[79]  M. Studdert-Kennedy Hand and Mind: What Gestures Reveal About Thought. , 1994 .

[80]  Thomas Fillon,et al.  YAAFE, an Easy to Use and Efficient Audio Feature Extraction Software , 2010, ISMIR.

[81]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[82]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[83]  Gerald Friedland,et al.  A hybrid approach to online speaker diarization , 2010, INTERSPEECH.

[84]  Peter Wittenburg,et al.  Automatic Signer Diarization - The Mover Is the Signer Approach , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[85]  Nicholas W. D. Evans,et al.  Speaker Diarization: A Review of Recent Research , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[86]  Peter Wittenburg,et al.  Speaker diarization using gesture and speech , 2014, INTERSPEECH.

[87]  Peter Wittenburg,et al.  Annotation by Category: ELAN and ISO DCR , 2008, LREC.

[88]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[89]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[90]  Gerald Friedland,et al.  Robust Speaker Diarization for short speech recordings , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[91]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[92]  Douglas E. Sturim,et al.  The MITLL NIST LRE 2015 Language Recognition System , 2016, Odyssey.

[93]  Przemyslaw Lenkiewicz,et al.  AV Processing in eHumanities - a paradigm shift , 2012, DH.

[94]  William M. Campbell,et al.  Acoustic, phonetic, and discriminative approaches to automatic language identification , 2003, INTERSPEECH.

[95]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[96]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[97]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[98]  Przemyslaw Lenkiewicz,et al.  Towards Automatic Gesture Stroke Detection , 2012, LREC.

[99]  Petros Maragos,et al.  Sign Language Recognition, Generation, and Modelling: A Research Effort with Applications in Deaf Communication , 2009, HCI.

[100]  David A. van Leeuwen,et al.  The AMI Speaker Diarization System for NIST RT06s Meeting Data , 2006, MLMI.

[101]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[102]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[103]  Hermann Hienz,et al.  Video-based continuous sign language recognition using statistical methods , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[104]  Nicolas Pugeault,et al.  Sign language recognition using sub-units , 2012, J. Mach. Learn. Res..

[105]  Marcos Zampieri,et al.  VarClass: An Open-source Language Identification Tool for Language Varieties , 2014, LREC.

[106]  R. Real,et al.  AUC: a misleading measure of the performance of predictive distribution models , 2008 .

[107]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[108]  A. Kendon Gesticulation and Speech: Two Aspects of the Process of Utterance , 1981 .

[109]  Joel R. Tetreault,et al.  A Report on the First Native Language Identification Shared Task , 2013, BEA@NAACL-HLT.

[110]  Gerald Friedland,et al.  The ICSI RT-09 Speaker Diarization System , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[111]  Fabio Valente,et al.  DiarTk : An Open Source Toolkit for Research in Multistream Speaker Diarization and its Application to Meetings Recordings , 2012, INTERSPEECH.