Now You're Speaking My Language: Visual Language Identification
暂无分享,去创建一个
Joon Son Chung | Andrew Zisserman | Triantafyllos Afouras | Andrew Zisserman | Triantafyllos Afouras
[1] Ming Li,et al. End-to-end Language Identification using NetFV and NetVLAD , 2018, 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP).
[2] Tomás Pajdla,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[3] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[4] Ming Li,et al. Utterance-level End-to-end Language Identification Using Attention-based CNN-BLSTM , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[6] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Patrick Pérez,et al. Revisiting the VLAD image representation , 2013, ACM Multimedia.
[8] Hanna Mazzawi,et al. Improving Keyword Spotting and Language Identification via Neural Architecture Search at Scale , 2019, INTERSPEECH.
[9] Athena Vouloumanos,et al. Discriminating languages by speech-reading , 2007, Perception & psychophysics.
[10] Whitney M. Weikum,et al. Visual Language Discrimination in Infancy , 2007, Science.
[11] Aparna Brahme,et al. Lip Detection and Lip Geometric Feature Extraction using Constrained Local Model for Spoken Language Identification using Visual Speech Recognition , 2016 .
[12] Roger Hsiao,et al. Improving Language Identification for Multilingual Speakers , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Douglas A. Reynolds,et al. Deep Neural Network Approaches to Speaker and Language Recognition , 2015, IEEE Signal Processing Letters.
[14] Bo Xu,et al. End-to-End Language Identification Using Attention-Based Recurrent Neural Networks , 2016, INTERSPEECH.
[15] Shilin Wang,et al. Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[16] Thomas Paine,et al. Large-Scale Visual Speech Recognition , 2018, INTERSPEECH.
[17] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Tetsuya Ogata,et al. Lipreading using convolutional neural network , 2014, INTERSPEECH.
[19] Maja Pantic,et al. Audio-Visual Speech Recognition with a Hybrid CTC/Attention Architecture , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[20] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Pedro J. Moreno,et al. A Real-Time End-to-End Multilingual Speech Recognition Architecture , 2015, IEEE Journal of Selected Topics in Signal Processing.
[22] Lukás Burget,et al. Language Recognition in iVectors Space , 2011, INTERSPEECH.
[23] Ming Li,et al. Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System , 2018, Odyssey.
[24] Omkar M. Parkhi,et al. VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).
[25] Alex Waibel,et al. Neural Codes to Factor Language in Multilingual Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Themos Stafylakis,et al. Combining Residual Networks with LSTMs for Lipreading , 2017, INTERSPEECH.
[27] Jiri Matas,et al. Visual Language Identification from Facial Landmarks , 2017, SCIA.
[28] Roger Lass,et al. Phonology: An Introduction to Basic Concepts , 1984 .
[29] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[30] Joon Son Chung,et al. Deep Lip Reading: a comparison of models and an online application , 2018, INTERSPEECH.
[31] David A. Ross,et al. Automatic Language Identification in music videos with low level audio and visual features , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Yonghong Yan,et al. A New Time-Frequency Attention Mechanism for TDNN and CNN-LSTM-TDNN, with Application to Language Identification , 2019, INTERSPEECH.
[33] Sriram Ganapathy,et al. Towards Relevance and Sequence Modeling in Language Recognition , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[34] Joon Son Chung,et al. Utterance-level Aggregation for Speaker Recognition in the Wild , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Douglas A. Reynolds,et al. Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.
[36] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[37] Olivier Siohan,et al. Recurrent Neural Network Transducer for Audio-Visual Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[38] Quan Wang,et al. Tuplemax Loss for Language Identification , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Stephen J. Cox,et al. Speaker independent visual-only language identification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[40] Stephen J. Cox,et al. Language Identification Using Visual Features , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[41] David B. Pisoni,et al. Language identification from visual-only speech signals , 2010, Attention, perception & psychophysics.
[42] William M. Campbell,et al. Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..