Visual recognition of human communication
暂无分享,去创建一个
[1] Satoshi Tamura,et al. Audio-visual speech recognition using deep bottleneck features and high-performance lipreading , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).
[2] John H. L. Hansen,et al. High performance digit recognition in real car environments , 2002, INTERSPEECH.
[3] Andrew Zisserman,et al. Domain-Adaptive Discriminative One-Shot Learning of Gestures , 2014, ECCV.
[4] Petros Maragos,et al. Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[5] Thad Starner,et al. American sign language recognition with the kinect , 2011, ICMI '11.
[6] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.
[7] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[9] Oliver Durr,et al. Speaker identification and clustering using convolutional neural networks , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).
[10] Wei Li,et al. 3D SMoSIFT: three-dimensional sparse motion scale invariant feature transform for activity recognition from RGB-D videos , 2014, J. Electronic Imaging.
[11] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[12] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[13] Andrew Zisserman,et al. Deep Face Recognition , 2015, BMVC.
[14] Davis E. King,et al. Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..
[15] Alex Graves,et al. Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.
[16] Mark Liberman,et al. Speaker identification on the SCOTUS corpus , 2008 .
[17] Patrick Kenny,et al. Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .
[18] Andrew Zisserman,et al. Large-scale Learning of Sign Language by Watching TV (Using Co-occurrences) , 2013, BMVC.
[19] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.
[20] Rainer Lienhart,et al. Reliable Transition Detection in Videos: A Survey and Practitioner's Guide , 2001, Int. J. Image Graph..
[21] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.
[22] Vaibhava Goel,et al. Detecting audio-visual synchrony using deep neural networks , 2015, INTERSPEECH.
[23] Themos Stafylakis,et al. Combining Residual Networks with LSTMs for Lipreading , 2017, INTERSPEECH.
[24] Matti Pietikäinen,et al. A review of recent advances in visual speech decoding , 2014, Image Vis. Comput..
[25] Alex Park,et al. The MIT Mobile Device Speaker Verification Corpus: Data Collection and Preliminary Experiments , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.
[26] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[27] Hermann Ney,et al. Deep Sign: Hybrid CNN-HMM for Continuous Sign Language Recognition , 2016, BMVC.
[28] Abhinav Gupta,et al. Unsupervised Learning of Visual Representations Using Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[29] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[31] Themos Stafylakis,et al. Deep Word Embeddings for Visual Speech Recognition , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Timothy F. Cootes,et al. Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..
[33] John Lewis. Automated lip-sync: Background and techniques , 1991, Comput. Animat. Virtual Worlds.
[34] Wilmot Li,et al. Content-based tools for editing audio stories , 2013, UIST.
[35] Sergey Ioffe,et al. Probabilistic Linear Discriminant Analysis , 2006, ECCV.
[36] Andrew Zisserman,et al. Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos , 2014, ACCV.
[37] Geoffrey E. Hinton,et al. Training Recurrent Neural Networks , 2013 .
[38] Matthew J. Hausknecht,et al. Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Sébastien Marcel,et al. MOBIO Database for the ICPR 2010 Face and Speech Competition , 2009 .
[40] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[41] Grzegorz Kondrak,et al. A New Algorithm for the Alignment of Phonetic Sequences , 2000, ANLP.
[42] Barry-John Theobald,et al. Comparing visual features for lipreading , 2009, AVSP.
[43] Kee-Eung Kim,et al. Multi-view Automatic Lip-Reading Using Neural Network , 2016, ACCV Workshops.
[44] Wei Li,et al. One-shot learning gesture recognition from RGB-D data using bag of features , 2013, J. Mach. Learn. Res..
[45] Sudeep Sarkar,et al. Similarity Measure between Two Gestures Using Triplets , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[46] Tetsuya Ogata,et al. Lipreading using convolutional neural network , 2014, INTERSPEECH.
[47] Aaron Lawson,et al. The Speakers in the Wild (SITW) Speaker Recognition Database , 2016, INTERSPEECH.
[48] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.
[49] Josephine Sullivan,et al. One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[50] Marwan Mattar,et al. Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .
[51] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.
[52] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[53] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[54] Andrea Vedaldi,et al. MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.
[55] Andreas Stolcke,et al. The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[56] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[57] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.
[58] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[59] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[60] Hermann Ney,et al. Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data is Continuous and Weakly Labelled , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[61] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[62] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[63] T. K. Vintsyuk. Speech discrimination by dynamic programming , 1968 .
[64] M. Marschark,et al. The Oxford Handbook of Deaf Studies, Language, and Education, Volume 2. , 2010 .
[65] Andrew Zisserman,et al. Flowing ConvNets for Human Pose Estimation in Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[66] Douglas A. Reynolds,et al. Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..
[67] Jürgen Schmidhuber,et al. Lipreading with long short-term memory , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[68] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[69] Iasonas Kokkinos,et al. Understanding Objects in Detail with Fine-Grained Attributes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[70] Hermann Ney,et al. Deep Learning of Mouth Shapes for Sign Language , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).
[71] Matti Pietikäinen,et al. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON MULTIMEDIA 1 Lipreading with Local Spatiotemporal Descriptors , 2022 .
[72] D. Hubel,et al. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.
[73] Ming Liu,et al. AVICAR: audio-visual speech corpus in a car environment , 2004, INTERSPEECH.
[74] Andreas Stolcke,et al. Artificial neural network features for speaker diarization , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).
[75] Satoshi Tamura,et al. GIF-LR:GA-based informative feature for lipreading , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.
[76] Matti Pietikäinen,et al. A Compact Representation of Visual Speech Data Using Latent Variables , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[77] Ira Kemelmacher-Shlizerman,et al. Synthesizing Obama , 2017, ACM Trans. Graph..
[78] J.N. Gowdy,et al. CUAVE: A new audio-visual database for multimodal human-computer interface research , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[79] Hongbin Zha,et al. Unsupervised Random Forest Manifold Alignment for Lipreading , 2013, 2013 IEEE International Conference on Computer Vision.
[80] A. Murat Tekalp,et al. Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis , 2007, IEEE Transactions on Multimedia.
[81] Alexei A. Efros,et al. Colorful Image Colorization , 2016, ECCV.
[82] Maja Pantic,et al. Deep complementary bottleneck features for visual speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[83] Edward H. Adelson,et al. Learning visual groups from co-occurrences in space and time , 2015, ArXiv.
[84] David F. McAllister,et al. Lip synchronization of speech , 1997, AVSP.
[85] D. Bitzer,et al. Automated lip-sync: direct translation of speech-sound to mouth-shape , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.
[86] Ivan Laptev,et al. Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[87] Tetsuya Ogata,et al. Audio-visual speech recognition using deep learning , 2014, Applied Intelligence.
[88] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[89] Matti Pietikäinen,et al. Concatenated Frame Image Based CNN for Visual Speech Recognition , 2016, ACCV Workshops.
[90] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[91] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.
[92] Laura Cristina Lanzarini,et al. LSA64: An Argentinian Sign Language Dataset , 2023, ArXiv.
[93] Lara Lynn Stoll,et al. Finding Difficult Speakers in Automatic Speaker Recognition , 2011 .
[94] David A. van Leeuwen,et al. NFI-FRITS: A forensic speaker recognition database and some first experiments , 2014, Odyssey.
[95] Alexei A. Efros,et al. Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[96] Ruslan Salakhutdinov,et al. Action Recognition using Visual Attention , 2015, NIPS 2015.
[97] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[98] Vaibhava Goel,et al. Deep multimodal learning for Audio-Visual Speech Recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[99] Desmond Morris,et al. The naked ape : a zoologist's study of the human animal , 1968 .
[100] Yisong Yue,et al. A deep learning approach for generalized speech animation , 2017, ACM Trans. Graph..
[101] Enrique Argones-Rúa,et al. Audio-visual speech asynchrony detection using co-inertia analysis and coupled hidden markov models , 2009, Pattern Analysis and Applications.
[102] Stan Sclaroff,et al. Exploiting phonological constraints for handshape inference in ASL video , 2011, CVPR 2011.
[103] Georg Heigold,et al. End-to-end text-dependent speaker verification , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[104] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[105] Shengcai Liao,et al. Learning Face Representation from Scratch , 2014, ArXiv.
[106] Igor S. Pandzic,et al. A Real-Time Lip SYNC System Using a Genetic Algorithm for Automatic Neural Network Configuration , 2005, 2005 IEEE International Conference on Multimedia and Expo.
[107] Karl-Friedrich Kraiss,et al. Recent developments in visual sign language recognition , 2008, Universal Access in the Information Society.
[108] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[109] J.B. Millar,et al. The Australian National Database of Spoken Language , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[110] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[111] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..
[112] P.C. Woodland,et al. The 1994 HTK large vocabulary speech recognition system , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[113] Kyoung Mu Lee,et al. Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[114] Satoshi Nakamura,et al. Audio-visual speech translation with automatic lip syncqronization and face tracking based on 3-D head model , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[115] Patrick Pérez,et al. Poisson image editing , 2003, ACM Trans. Graph..
[116] Amirsina Torfi,et al. 3D Convolutional Neural Networks for Cross Audio-Visual Matching Recognition , 2017, IEEE Access.
[117] Douglas D. O'Shaughnessy,et al. Invited paper: Automatic speech recognition: History, methods and challenges , 2008, Pattern Recognit..
[118] Dominique Genoud,et al. POLYCOST: A telephone-speech database for speaker recognition , 2000, Speech Commun..
[119] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.
[120] A. Peregudov,et al. Relative timing of sound and vision: evaluation and correction , 2005, Proceedings of the Ninth International Symposium on Consumer Electronics, 2005. (ISCE 2005)..
[121] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.