Audio-Video detection of the active speaker in meetings
暂无分享,去创建一个
[1] Matti Pietikäinen,et al. A review of recent advances in visual speech decoding , 2014, Image Vis. Comput..
[2] Quan Wang,et al. Speaker Diarization with LSTM , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Maja Pantic,et al. Visual-Only Recognition of Normal, Whispered and Silent Speech , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[5] Marek Hrúz,et al. Convolutional Neural Network for speaker change detection in telephone speaker diarization system , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Jean Carletta,et al. Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus , 2007, Lang. Resour. Evaluation.
[7] Jean-Marc Odobez,et al. Learning Multimodal Temporal Representation for Dubbing Detection in Broadcast Media , 2016, ACM Multimedia.
[8] Rita Cucchiara,et al. POSEidon: Face-from-Depth for Driver Pose Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Pavel Korshunov,et al. Tampered Speaker Inconsistency Detection with Phonetically Aware Audio-visual Features , 2019, ICML 2019.
[10] Carlos Busso,et al. End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models , 2018, Speech Commun..
[11] Joon Son Chung,et al. Deep Audio-Visual Speech Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[12] Peter Robinson,et al. 3D Constrained Local Model for rigid and non-rigid facial tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[13] Yann LeCun,et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[14] Gang Liu,et al. A Differential Approach for Gaze Estimation , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[15] Joon Son Chung,et al. Voxceleb: Large-scale speaker verification in the wild , 2020, Comput. Speech Lang..
[16] Rong Chen,et al. A PCA Based Visual DCT Feature Extraction Method for Lip-Reading , 2006, 2006 International Conference on Intelligent Information Hiding and Multimedia.
[17] Jonas Beskow,et al. Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially Aware Language Acquisition , 2017, IEEE Transactions on Cognitive and Developmental Systems.
[18] Nicholas W. D. Evans,et al. The EURECOM Submission to the First DIHARD Challenge , 2018, INTERSPEECH.
[19] Paavo Alku,et al. Speaker recognition from whispered speech: A tutorial survey and an application of time-varying linear prediction , 2018, Speech Commun..
[20] Joon Son Chung,et al. Utterance-level Aggregation for Speaker Recognition in the Wild , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Petr Motlícek,et al. Deep Neural Networks for Multiple Speaker Detection and Localization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[22] John H. L. Hansen,et al. An unsupervised visual-only voice activity detection approach using temporal orofacial features , 2015, INTERSPEECH.
[23] Sivaji Bandyopadhyay,et al. Says Who? Deep Learning Models for Joint Speech Recognition, Segmentation and Diarization , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Pierre-Michel Bousquet,et al. Speaker Modeling Using Local Binary Decisions , 2011, INTERSPEECH.
[25] Horst Bischof,et al. Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).
[26] Rama Chellappa,et al. HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.