A Dynamic Stream Weight Backprop Kalman Filter for Audiovisual Speaker Tracking
暂无分享,去创建一个
Tomohiro Nakatani | Shoko Araki | Marc Delcroix | Keisuke Kinoshita | Christopher Schymura | Tsubasa Ochiai | Dorothea Kolossa
[1] Aggelos K. Katsaggelos,et al. Feature space video stream consistency estimation for dynamic stream weighting in audio-visual speech recognition , 2008, 2008 15th IEEE International Conference on Image Processing.
[2] Archontis Politis,et al. Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network , 2017, 2018 26th European Signal Processing Conference (EUSIPCO).
[3] Aggelos K. Katsaggelos,et al. Audiovisual Fusion: Challenges and New Approaches , 2015, Proceedings of the IEEE.
[4] Maximo Cobos,et al. A Modified SRP-PHAT Functional for Robust Real-Time Sound Source Localization With Scalable Spatial Sampling , 2011, IEEE Signal Processing Letters.
[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[6] Jean-Marc Odobez,et al. Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[7] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.
[8] Ivan Dokmanic,et al. Pyroomacoustics: A Python Package for Audio Room Simulation and Array Processing Algorithms , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Martin Heckmann,et al. Environmentally robust audio-visual speaker identification , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).
[10] DeLiang Wang,et al. Binaural Localization of Multiple Sources in Reverberant and Noisy Environments , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[11] VirtanenTuomas,et al. Detection and Classification of Acoustic Scenes and Events , 2018 .
[12] Radu Horaud,et al. Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[13] Gwenn Englebienne,et al. Multimodal Speaker Diarization , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[14] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[15] Junichi Yamagishi,et al. CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2017 .
[16] Josephine Sullivan,et al. One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[17] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.
[18] Stephan Gerlach,et al. 2D Audio-Visual Localization in Home Environments using a Particle Filter , 2012, ITG Conference on Speech Communication.
[19] Ole Winther,et al. A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning , 2017, NIPS.
[20] Junichi Yamagishi,et al. SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2016 .
[21] Dorothea Kolossa,et al. Extending Linear Dynamical Systems with Dynamic Stream Weights for Audiovisual Speaker Localization , 2018, 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC).
[22] Xin Liu,et al. Audio-Visual Speaker Recognition via Multi-modal Correlated Neural Networks , 2016, 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops (WIW).
[23] Maja Pantic,et al. End-to-End Audiovisual Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] H.K. Ekenel,et al. Kalman filters for audio-video source localization , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..
[25] Georges Linarès,et al. Audiovisual speaker diarization of TV series , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Ahmed Hussen Abdelaziz. Comparing Fusion Models for DNN-Based Audiovisual Continuous Speech Recognition , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[27] Harit Pandya,et al. Recurrent Kalman Networks: Factorized Inference in High-Dimensional Deep Feature Spaces , 2019, ICML.
[28] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[29] Dorothea Kolossa,et al. Audiovisual Speaker Tracking Using Nonlinear Dynamical Systems With Dynamic Stream Weights , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[30] Sergey Levine,et al. Backprop KF: Learning Discriminative Deterministic State Estimators , 2016, NIPS.