Improving Speaker Diarization of TV Series using Talking-Face Detection and Clustering
暂无分享,去创建一个
[1] Davis E. King,et al. Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..
[2] Rainer Stiefelhagen,et al. Improved weak labels using contextual cues for person identification in videos , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).
[3] Andrew Zisserman,et al. Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.
[4] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.
[5] Josephine Sullivan,et al. One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[6] Nicholas W. D. Evans,et al. Speaker Diarization: A Review of Recent Research , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[7] Thomas Fillon,et al. YAAFE, an Easy to Use and Efficient Audio Feature Extraction Software , 2010, ISMIR.
[8] Mahadev Satyanarayanan,et al. OpenFace: A general-purpose face recognition library with mobile applications , 2016 .
[9] Jean-Luc Gauvain,et al. Minimum word error training of RNN-based voice activity detection , 2015, INTERSPEECH.
[10] Georges Linarès,et al. Constrained speaker diarization of TV series based on visual patterns , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).
[11] Michael Felsberg,et al. Accurate Scale Estimation for Robust Visual Tracking , 2014, BMVC.
[12] Camille Guinaudeau,et al. TVD: A Reproducible and Multiply Aligned TV Series Dataset , 2014, LREC.
[13] David D. Cox,et al. Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.
[14] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[15] Thierry Bazillon,et al. Speaker diarization of heterogeneous web video files: A preliminary study , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] S. Chen,et al. Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .
[17] Sylvain Meignier,et al. Automatic named identification of speakers using diarization and ASR systems , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[18] Andrew Zisserman,et al. Taking the bite out of automated naming of characters in TV video , 2009, Image Vis. Comput..
[19] Xiaojun Wu,et al. Convergence analysis and improvements of quantum-behaved particle swarm optimization , 2012, Inf. Sci..
[20] David D. Cox,et al. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.
[21] Alexandre Allauzen,et al. "Sheldon speaking, Bonjour!": Leveraging Multilingual Tracks for (Weakly) Supervised Speaker Identification , 2014, ACM Multimedia.
[22] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Rainer Stiefelhagen,et al. Book2Movie: Aligning video scenes with book chapters , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] William J. Christmas,et al. A Study on Automatic Shot Change Detection , 1998, ECMAST.