Person instance graphs for mono-, cross- and multi-modal person recognition in multimedia data: application to speaker identification in TV broadcast
暂无分享,去创建一个
Anindya Roy | Claude Barras | Hervé Bredin | Viet Bac Le | H. Bredin | V. Le | C. Barras | A. Roy
[1] Georges Quénot,et al. Unsupervised naming of speakers in broadcast TV: using written names, pronounced names or both? , 2013, INTERSPEECH.
[2] Sylvain Meignier,et al. Automatic named identification of speakers using diarization and ASR systems , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[3] Zhu Liu,et al. Using Both Audio and Visual Clues , 2000 .
[4] Claude Barras,et al. On the use of GSV-SVM for Speaker Diarization and Tracking , 2010, Odyssey.
[5] Julie Mauclair,et al. Speaker Diarization: About whom the Speaker is Talking ? , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.
[6] Sophie Rosset,et al. Models Cascade for Tree-Structured Named Entity Detection , 2011, IJCNLP.
[7] Jean-Luc Gauvain,et al. The LIMSI Broadcast News transcription system , 2002, Speech Commun..
[8] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..
[9] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..
[10] Olivier Galibert,et al. The REPERE Corpus : a multimodal corpus for person recognition , 2012, LREC.
[11] Andrew W. Moore,et al. X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.
[12] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.
[13] Rainer Stiefelhagen,et al. Semi-supervised Learning with Constraints for Person Identification in Multimedia Data , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[14] Philip S. Yu,et al. Clustering on Complex Graphs , 2008, AAAI.
[15] Sridha Sridharan,et al. Feature warping for robust speaker verification , 2001, Odyssey.
[16] Christos Faloutsos,et al. MMSS : graph-based multi-modal story-oriented video summarization and retrieval , 2004 .
[17] Marcel Worring,et al. Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..
[18] Ben Taskar,et al. Talking pictures: Temporal grouping and dialog-supervised person recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[19] L. Lamel,et al. A comparative study using manual and automatic transcriptions for diarization , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..
[20] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.
[21] Christos Faloutsos,et al. Automatic multimedia cross-modal correlation discovery , 2004, KDD.
[22] R. Smith,et al. An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).
[23] Ronan Guivarch,et al. On a Strategy for Spectral Clustering with Parallel Computation , 2010, VECPAR.
[24] Sue Tranter. Who Really Spoke When? Finding Speaker Turns and Identities in Broadcast News Audio , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[25] Avideh Zakhor,et al. Applications of Video-Content Analysis and Retrieval , 2002, IEEE Multim..
[26] Georges Quénot,et al. From Text Detection in Videos to Person Identification , 2012, 2012 IEEE International Conference on Multimedia and Expo.
[27] S. Chen,et al. Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .
[28] Gérard Chollet,et al. Audiovisual Speech Synchrony Measure: Application to Biometrics , 2007, EURASIP J. Adv. Signal Process..
[29] Christopher D. Manning,et al. Enforcing Transitivity in Coreference Resolution , 2008, ACL.
[30] Jean-Luc Gauvain,et al. Multistage speaker diarization of broadcast news , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[31] Hervé Bredin,et al. Integer linear programming for speaker diarization and cross-modal identification in TV broadcast , 2013, INTERSPEECH.
[32] Jean-Loup Guillaume,et al. Fast unfolding of communities in large networks , 2008, 0803.0476.
[33] Olivier Galibert,et al. The ETAPE corpus for the evaluation of speech-based TV content processing in the French language , 2012, LREC.
[34] M E J Newman,et al. Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.
[35] Pascale Sébillot,et al. A Scalable Video Search Engine Based on Audio Content Indexing and Topic Segmentation , 2011, ArXiv.
[36] Mickael Rouvier,et al. I-vectors and ILP clustering adapted to cross-show speaker diarization , 2012, INTERSPEECH.
[37] Jean-Luc Gauvain,et al. Partitioning and transcription of broadcast news data , 1998, ICSLP.
[38] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..
[39] Paul Deléglise,et al. Extracting true speaker identities from transcriptions , 2007, INTERSPEECH.
[40] Georges Quénot,et al. Unsupervised Speaker Identification using Overlaid Texts in TV Broadcast , 2012, INTERSPEECH.