Learning Visual Voice Activity Detection with an Automatically Annotated Dataset
暂无分享,去创建一个
[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[2] Radu Horaud,et al. A Comprehensive Analysis of Deep Regression , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[3] Cláudio Rosito Jung,et al. Simultaneous-Speaker Voice Activity Detection and Localization Using Mid-Fusion of SVM and HMMs , 2014, IEEE Transactions on Multimedia.
[4] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[5] Sridha Sridharan,et al. Visual Voice Activity Detection Using Frontal versus Profile Views , 2011, 2011 International Conference on Digital Image Computing: Techniques and Applications.
[6] Philip J. B. Jackson,et al. A visual voice activity detection method with adaboosting , 2011 .
[7] Helge Reikeras,et al. Audio-visual automatic speech recognition using Dynamic Bayesian Networks , 2011 .
[8] Christian Jutten,et al. Two novel visual voice activity detectors based on appearance models and retinal filtering , 2007, 2007 15th European Signal Processing Conference.
[9] Apostol Natsev,et al. YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.
[10] Daniil Kocharov,et al. Voice Activity Detector (VAD) Based on Long-Term Mel Frequency Band Features , 2016, TSD.
[11] Shrikanth Narayanan,et al. Toward Visual Voice Activity Detection for Unconstrained Videos , 2019, 2019 IEEE International Conference on Image Processing (ICIP).
[12] Ioannis Pitas,et al. Visual speech detection using mouth region intensities , 2006, 2006 14th European Signal Processing Conference.
[13] J.N. Gowdy,et al. CUAVE: A new audio-visual database for multimodal human-computer interface research , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[14] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[15] Cordelia Schmid,et al. P-CNN: Pose-Based CNN Features for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[16] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.
[17] Alexandros Iosifidis,et al. Visual Voice Activity Detection in the Wild , 2016, IEEE Transactions on Multimedia.
[18] Juergen Luettin,et al. Audio-Visual Automatic Speech Recognition: An Overview , 2004 .
[19] Georgios Tzimiropoulos,et al. How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[20] Cordelia Schmid,et al. Action recognition by dense trajectories , 2011, CVPR 2011.
[21] Ioannis Pitas,et al. Visual Lip Activity Detection and Speaker Detection Using Mouth Region Intensities , 2009, IEEE Transactions on Circuits and Systems for Video Technology.
[22] Peng Liu,et al. Voice activity detection using visual information , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[23] Juan Manuel Górriz,et al. Voice Activity Detection. Fundamentals and Speech Recognition System Robustness , 2007 .
[24] Davis E. King,et al. Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..
[25] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.
[26] Christian Jutten,et al. An Analysis of Visual Speech Information Applied to Voice Activity Detection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[27] Bum-Jae You,et al. Robust visual speakingness detection using bi-level HMM , 2012, Pattern Recognit..
[28] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[29] Wenwu Wang,et al. Interference Reduction in Reverberant Speech Separation With Visual Voice Activity Detection , 2014, IEEE Transactions on Multimedia.
[30] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.