"Sheldon speaking, Bonjour!": Leveraging Multilingual Tracks for (Weakly) Supervised Speaker Identification
暂无分享,去创建一个
Alexandre Allauzen | Anindya Roy | Hervé Bredin | Nicolas Pécheux | H. Bredin | A. Allauzen | Nicolas Pécheux | A. Roy
[1] Andrew Zisserman,et al. “Who are you?” - Learning person specific classifiers from video , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[2] Rainer Stiefelhagen,et al. Semi-supervised Learning with Constraints for Person Identification in Multimedia Data , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[3] José B. Mariño,et al. N-gram-based Machine Translation , 2006, CL.
[4] Camille Guinaudeau,et al. TVD: A Reproducible and Multiply Aligned TV Series Dataset , 2014, LREC.
[5] Sue Tranter. Who Really Spoke When? Finding Speaker Turns and Identities in Broadcast News Audio , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[6] Francisco Casacuberta,et al. Machine Translation with Inferred Stochastic Finite-State Transducers , 2004, Computational Linguistics.
[7] Andrew Zisserman,et al. Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.
[8] Quoc-Khanh Do,et al. Limsi @ Wmt13 , 2013, WMT@ACL.
[9] Julie Mauclair,et al. Speaker Diarization: About whom the Speaker is Talking ? , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.
[10] Andrew Zisserman,et al. Efficient Visual Search of Videos Cast as Text Retrieval , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[11] José B. Mariño,et al. Ncode: an Open Source Bilingual N-gram SMT Toolkit , 2011, Prague Bull. Math. Linguistics.
[12] Georges Quénot,et al. Unsupervised naming of speakers in broadcast TV: using written names, pronounced names or both? , 2013, INTERSPEECH.
[13] Rainer Stiefelhagen,et al. “Knock! Knock! Who is it?” probabilistic person identification in TV-series , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[14] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..
[15] Jean-Luc Gauvain,et al. Multistage speaker diarization of broadcast news , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[16] S. Chen,et al. Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .
[17] Christoph Tillmann,et al. A Unigram Orientation Model for Statistical Machine Translation , 2004, NAACL.
[18] Philipp Koehn,et al. Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.
[19] Philipp Koehn,et al. Findings of the 2012 Workshop on Statistical Machine Translation , 2012, WMT@NAACL-HLT.
[20] Giuseppe Attardi,et al. Proceedings of the Eighth Workshop on Statistical Machine Translation , 2013 .
[21] José B. Mariño,et al. Improving statistical MT by coupling reordering and decoding , 2006, Machine Translation.
[22] Gerald Friedland,et al. Joke-o-mat: browsing sitcoms punchline by punchline , 2009, ACM Multimedia.
[23] Thomas Fillon,et al. YAAFE, an Easy to Use and Efficient Audio Feature Extraction Software , 2010, ISMIR.
[24] Olivier Galibert,et al. A presentation of the REPERE challenge , 2012, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI).
[25] Jean-Luc Gauvain,et al. Feature and score normalization for speaker verification of cellular data , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[26] Stuart M. Shieber,et al. Towards Robust Context-Sensitive Sentence Alignment for Monolingual Corpora , 2006, EACL.
[27] Ben Taskar,et al. Talking pictures: Temporal grouping and dialog-supervised person recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[28] Sophie Rosset,et al. Person Instance Graphs for Named Speaker Identification in TV Broadcast , 2014, Odyssey.
[29] Hervé Bredin,et al. Segmentation of TV shows into scenes using speaker diarization and speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Miles Osborne,et al. Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.
[31] Alvin F. Martin,et al. The NIST 1999 Speaker Recognition Evaluation - An Overview , 2000, Digit. Signal Process..
[32] Adam Lopez,et al. Proceedings of the Seventh Workshop on Statistical Machine Translation , 2012 .
[33] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..
[34] Paul Deléglise,et al. Extracting true speaker identities from transcriptions , 2007, INTERSPEECH.
[35] Philipp Koehn,et al. Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.
[36] Hervé Bredin,et al. Integer linear programming for speaker diarization and cross-modal identification in TV broadcast , 2013, INTERSPEECH.
[37] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[38] Douglas A. Reynolds,et al. An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[39] L. Lamel,et al. A comparative study using manual and automatic transcriptions for diarization , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..
[40] Sylvain Meignier,et al. Automatic named identification of speakers using diarization and ASR systems , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[41] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.