暂无分享,去创建一个
Andrew Zisserman | Themos Stafylakis | Triantafyllos Afouras | Samuel Albanie | Liliane Momeni | Andrew Zisserman | Triantafyllos Afouras | Samuel Albanie | Themos Stafylakis | Liliane Momeni
[1] Minjae Lee,et al. Online Keyword Spotting with a Character-Level Recurrent Neural Network , 2015, ArXiv.
[2] Vikrant Singh Tomar,et al. Efficient keyword spotting using time delay neural networks , 2018, INTERSPEECH.
[3] Maja Pantic,et al. Audio-Visual Speech Recognition with a Hybrid CTC/Attention Architecture , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[4] Sercan Ömer Arik,et al. Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting , 2017, INTERSPEECH.
[5] Awni Y. Hannun,et al. An End-to-End Architecture for Keyword Spotting and Voice Activity Detection , 2016, ArXiv.
[6] Joon Son Chung,et al. Deep Audio-Visual Speech Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[7] Hong Liu,et al. Audio-visual keyword spotting based on adaptive decision fusion under noisy conditions for human-robot interaction , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).
[8] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[9] Bhuvana Ramabhadran,et al. End-to-end speech recognition and keyword search on low-resource languages , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Joon Son Chung,et al. ASR is All You Need: Cross-Modal Distillation for Lip Reading , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[12] Yundong Zhang,et al. Hello Edge: Keyword Spotting on Microcontrollers , 2017, ArXiv.
[13] Georg Heigold,et al. Small-footprint keyword spotting using deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Themos Stafylakis,et al. Combining Residual Networks with LSTMs for Lipreading , 2017, INTERSPEECH.
[15] Nikko Strom,et al. Compressed Time Delay Neural Network for Small-Footprint Keyword Spotting , 2017, INTERSPEECH.
[16] Thomas Paine,et al. Large-Scale Visual Speech Recognition , 2018, INTERSPEECH.
[17] Richard F. Lyon,et al. Trainable frontend for robust and far-field keyword spotting , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Brian Kingsbury,et al. End-to-end ASR-free keyword search from speech , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Hong Liu,et al. A Novel Lip Descriptor for Audio-Visual Keyword Spotting Based on Adaptive Decision Fusion , 2016, IEEE Transactions on Multimedia.
[21] Jürgen Schmidhuber,et al. An Application of Recurrent Neural Networks to Discriminative Keyword Spotting , 2007, ICANN.
[22] Nikko Strom,et al. Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).
[23] Tara N. Sainath,et al. Convolutional neural networks for small-footprint keyword spotting , 2015, INTERSPEECH.
[24] John H. L. Hansen,et al. Babble Noise: Modeling, Analysis, and Applications , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[25] Joon Son Chung,et al. BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues , 2020, ECCV.
[26] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[27] Wei Li,et al. Streaming small-footprint keyword spotting using sequence-to-sequence models , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[28] Shouyi Yin,et al. Small-Footprint Keyword Spotting with Graph Convolutional Network , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[29] Lukás Burget,et al. Comparison of keyword spotting approaches for informal continuous speech , 2005, INTERSPEECH.
[30] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[31] Liang Zheng,et al. Spotting Visual Keywords from Temporal Sliding Windows , 2019, ICMI.
[32] Juhan Nam,et al. Temporal Feedback Convolutional Recurrent Neural Networks for Keyword Spotting , 2019, ArXiv.
[33] Sankaran Panchapagesan,et al. Model Compression Applied to Small-Footprint Keyword Spotting , 2016, INTERSPEECH.
[34] Junbo Zhang,et al. Sequence-to-sequence Models for Small-Footprint Keyword Spotting , 2018, ArXiv.
[35] C. V. Jawahar,et al. Word Spotting in Silent Lip Videos , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).
[36] Themos Stafylakis,et al. Zero-shot keyword spotting for visual speech recognition in-the-wild , 2018, ECCV.
[37] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[38] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[39] Joon Son Chung,et al. Deep Lip Reading: a comparison of models and an online application , 2018, INTERSPEECH.
[40] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Joon Son Chung,et al. Signs in time: Encoding human motion as a temporal image , 2016, ArXiv.
[43] Kai Yu,et al. Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC , 2016, INTERSPEECH.
[44] Dimitri Palaz,et al. Jointly Learning to Locate and Classify Words Using Convolutional Networks , 2016, INTERSPEECH.
[45] Shimon Whiteson,et al. LipNet: Sentence-level Lipreading , 2016, ArXiv.
[46] Joon Son Chung,et al. LRS3-TED: a large-scale dataset for visual speech recognition , 2018, ArXiv.