论文信息 - Machine learning for gesture recognition from videos

Machine learning for gesture recognition from videos

[1] Peter Wittenburg,et al. Improving Native Language Identification with TF-IDF Weighting , 2013, BEA@NAACL-HLT.

[2] Przemyslaw Lenkiewicz,et al. Application of video processing methods for linguistic research , 2011, LTC 2011.

[3] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[4] Gerald Friedland,et al. An adaptive initialization method for speaker Diarization based on prosodic features , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[6] Bin Ma,et al. A Vector Space Modeling Approach to Spoken Language Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7] P. KaewTrakulPong,et al. An Improved Adaptive Background Mixture Model for Real-time Tracking with Shadow Detection , 2002 .

[8] Ted E. Dunning,et al. Statistical Identification of Language , 1994 .

[9] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[10] Sileye O. Ba,et al. Speech/Non-Speech Detection in Meetings from Automatically Extracted low Resolution Visual Features , 2010, ICASSP.

[11] Jonas Beskow,et al. Visual Recognition of Isolated Swedish Sign Language Signs , 2012, ArXiv.

[12] Hervé Bourlard,et al. Unknown-multiple speaker clustering using HMM , 2002, INTERSPEECH.

[13] Douglas A. Reynolds,et al. Speaker diarisation for broadcast news , 2004, Odyssey.

[14] Peter Wittenburg,et al. Motion history images for online speaker/signer diarization , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15] Willem J. M. Levelt,et al. Pointing and voicing in deictic expressions , 1985 .

[16] Marcos Zampieri,et al. Automatic identification of language varieties: The case of Portuguese , 2012, KONVENS.

[17] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[18] Satoshi Nakamura,et al. Never-ending learning system for on-line speaker diarization , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[19] Tom Fawcett,et al. An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[20] José Manuel Pardo,et al. Robust Speaker Diarization for meetings , 2006 .

[21] Edward H. Adelson,et al. PYRAMID METHODS IN IMAGE PROCESSING. , 1984 .

[22] Mickael Rouvier,et al. An open-source state-of-the-art toolbox for broadcast news diarization , 2013, INTERSPEECH.

[23] J. Coates,et al. Turn‐taking patterns in deaf conversation , 2001 .

[24] E. Zwicker,et al. Subdivision of the audible frequency range into critical bands , 1961 .

[25] Dariu Gavrila,et al. The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[26] Gerald Friedland,et al. Live speaker identification in conversations , 2008, ACM Multimedia.

[27] Rachel I. Mayberry,et al. Language and Gesture: Gesture production during stuttered speech: insights into the nature of gesture–speech integration , 2000 .

[28] W. Stokoe,et al. Sign language structure: an outline of the visual communication systems of the American deaf. 1960. , 1961, Journal of deaf studies and deaf education.

[29] Xavier Anguera Miró,et al. Friends and enemies: a novel initialization for speaker diarization , 2006, INTERSPEECH.

[30] Ying Wu,et al. Vision-Based Gesture Recognition: A Review , 1999, Gesture Workshop.

[31] Slim Essid,et al. A Multimodal Approach to Speaker Diarization on TV Talk-Shows , 2013, IEEE Transactions on Multimedia.

[32] E. Klima. The signs of language , 1979 .

[33] Douglas A. Reynolds,et al. An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[34] Peter Wittenburg,et al. Automatic sign language identification , 2013, 2013 IEEE International Conference on Image Processing.

[35] Vladimir Vezhnevets,et al. A Survey on Pixel-Based Skin Color Detection Techniques , 2003 .

[36] Jitendra Ajmera,et al. A robust speaker clustering algorithm , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[37] Douglas A. Reynolds,et al. Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[38] Alex Pentland,et al. Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[39] Vittorio Murino,et al. Look at Who's Talking: Voice Activity Detection by Automated Gesture Analysis , 2011, AmI Workshops.

[40] Hugo Guterman,et al. Initialization of Iterative-Based Speaker Diarization Systems for Telephone Conversations , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[41] A. Kendon. Some Relationships Between Body Motion and Speech , 1972 .

[42] Timothy Baldwin,et al. Language Identification: The Long and the Short of the Matter , 2010, NAACL.

[43] Isabel Trancoso,et al. Accent identification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[44] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[45] Binyam Gebrekidan Gebre,et al. Classifying pluricentric languages: Extending the monolingual model , 2012 .

[46] Douglas A. Reynolds,et al. Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[47] J. Lannoy,et al. Gestures and Speech: Psychological Investigations , 1991 .

[48] Geoffrey Zweig,et al. An empirical study of automatic accent classification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[49] S. Shattuck-Hufnagel. A reply to McNeill. , 1982 .

[50] David A. van Leeuwen,et al. Speaker Diarization Error Analysis Using Oracle Components , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[51] Marcos Zampieri,et al. N-gram Language Models and POS Distribution for the Identification of Spanish Varieties (Ngrammes et Traits Morphosyntaxiques pour la Identification de Variétés de l’Espagnol) [in French] , 2013, JEP/TALN/RECITAL.

[52] Marijn Huijbregts,et al. The ICSI RT07s Speaker Diarization System , 2007, CLEAR.

[53] Sarah Florence Taub,et al. Language from the Body: Iconicity and Metaphor in American Sign Language , 2001 .

[54] P. Peer,et al. Human skin color clustering for face detection , 2003, The IEEE Region 8 EUROCON 2003. Computer as a Tool..

[55] Scott K. Liddell,et al. American Sign Language: The Phonological Base , 2013 .

[56] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[57] D. McNeill. Gesture and Thought , 2005 .

[58] Hervé Bourlard,et al. Using audio and visual cues for speaker diarisation initialisation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[59] Alex Pentland,et al. Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[60] Olivier Cappé,et al. Soft nonnegative matrix co-factorizationwith application to multimodal speaker diarization , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[61] E. Schegloff,et al. A simplest systematics for the organization of turn-taking for conversation , 1974 .

[62] P. Kay,et al. Universals and cultural variation in turn-taking in conversation , 2009, Proceedings of the National Academy of Sciences.

[63] Trevor Darrell,et al. Integrated Person Tracking Using Stereo, Color, and Pattern Detection , 2000, International Journal of Computer Vision.

[64] Marijn Huijbregts,et al. Segmentation, diarization and speech transcription : surprise data unraveled , 2008 .

[65] Jonathan G. Fiscus,et al. The Rich Transcription 2007 Meeting Recognition Evaluation , 2007, CLEAR.

[66] Andrew Y. Ng,et al. Learning Feature Representations with K-Means , 2012, Neural Networks: Tricks of the Trade.

[67] Aaron E. Rosenberg,et al. Unsupervised speaker segmentation of telephone conversations , 2002, INTERSPEECH.

[68] Md. Atiqur Rahman Ahad,et al. Motion History Images for Action Recognition and Understanding , 2012, SpringerBriefs in Computer Science.

[69] Chuohao Yeo,et al. Multi-modal speaker diarization of real-world meetings using compressed-domain video features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[70] Peter Wittenburg,et al. Unsupervised Feature Learning for Visual Sign Language Identification , 2014, ACL.

[71] Ming-Kuei Hu,et al. Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[72] S Goldin-Meadow,et al. What's communication got to do with it? Gesture in children blind from birth. , 1997, Developmental psychology.

[73] Peter Wittenburg,et al. The gesturer is the speaker , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[74] Susan Goldin-Meadow,et al. The Relation Between Gesture and Speech in Congenitally Blind and Sighted Language-Learners , 2000 .

[75] Lirong Dai,et al. Deep Bottleneck Features for Spoken Language Identification , 2014, PloS one.

[76] Sudeep Sarkar,et al. Exploring Co-Occurence Between Speech and Body Movement for Audio-Guided Video Localization , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[77] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[78] J. Hawkins,et al. On Intelligence , 2004 .

[79] M. Studdert-Kennedy. Hand and Mind: What Gestures Reveal About Thought. , 1994 .

[80] Thomas Fillon,et al. YAAFE, an Easy to Use and Efficient Audio Feature Extraction Software , 2010, ISMIR.

[81] M. Picheny,et al. Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[82] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[83] Gerald Friedland,et al. A hybrid approach to online speaker diarization , 2010, INTERSPEECH.

[84] Peter Wittenburg,et al. Automatic Signer Diarization - The Mover Is the Signer Approach , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[85] Nicholas W. D. Evans,et al. Speaker Diarization: A Review of Recent Research , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[86] Peter Wittenburg,et al. Speaker diarization using gesture and speech , 2014, INTERSPEECH.

[87] Peter Wittenburg,et al. Annotation by Category: ELAN and ISO DCR , 2008, LREC.

[88] Carlo Tomasi,et al. Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[89] J. Makhoul,et al. Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[90] Gerald Friedland,et al. Robust Speaker Diarization for short speech recordings , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[91] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[92] Douglas E. Sturim,et al. The MITLL NIST LRE 2015 Language Recognition System , 2016, Odyssey.

[93] Przemyslaw Lenkiewicz,et al. AV Processing in eHumanities - a paradigm shift , 2012, DH.

[94] William M. Campbell,et al. Acoustic, phonetic, and discriminative approaches to automatic language identification , 2003, INTERSPEECH.

[95] Erkki Oja,et al. Independent component analysis: algorithms and applications , 2000, Neural Networks.

[96] James W. Davis,et al. The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[97] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[98] Przemyslaw Lenkiewicz,et al. Towards Automatic Gesture Stroke Detection , 2012, LREC.

[99] Petros Maragos,et al. Sign Language Recognition, Generation, and Modelling: A Research Effort with Applications in Deaf Communication , 2009, HCI.

[100] David A. van Leeuwen,et al. The AMI Speaker Diarization System for NIST RT06s Meeting Data , 2006, MLMI.

[101] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[102] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[103] Hermann Hienz,et al. Video-based continuous sign language recognition using statistical methods , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[104] Nicolas Pugeault,et al. Sign language recognition using sub-units , 2012, J. Mach. Learn. Res..

[105] Marcos Zampieri,et al. VarClass: An Open-source Language Identification Tool for Language Varieties , 2014, LREC.

[106] R. Real,et al. AUC: a misleading measure of the performance of predictive distribution models , 2008 .

[107] Honglak Lee,et al. Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[108] A. Kendon. Gesticulation and Speech: Two Aspects of the Process of Utterance , 1981 .

[109] Joel R. Tetreault,et al. A Report on the First Native Language Identification Shared Task , 2013, BEA@NAACL-HLT.

[110] Gerald Friedland,et al. The ICSI RT-09 Speaker Diarization System , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[111] Fabio Valente,et al. DiarTk : An Open Source Toolkit for Research in Multistream Speaker Diarization and its Application to Meetings Recordings , 2012, INTERSPEECH.