Who is Really Talking? A Visual-Based Speaker Diarization Strategy

The speaker activity at the Canary Islands Parliament is recorded, and later manually annotated. This task can be modelled as a diarization problem, that is a way to automatically annotated who and when is speaking. In this paper, we propose the use of the visual cue to solve the diarization task. To perform this approach, it is mandatory to detect individuals, determine the one speaking, and extract features for matching. In order to test the performance of our proposal, we evaluate four different strategies based on the visual shot features.

[1]  Javier Ferreiros,et al.  Speaker Diarization Based on Intensity Channel Contribution , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Radu Arsinte,et al.  Speaker diarization experiments for Romanian parliamentary speech , 2015, 2015 International Symposium on Signals, Circuits and Systems (ISSCS).

[3]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Changsheng Xu,et al.  Robust Face-Name Graph Matching for Movie Character Identification , 2012, IEEE Transactions on Multimedia.

[5]  Thomas S. Huang,et al.  A spectral clustering approach to speaker diarization , 2006, INTERSPEECH.

[6]  Nicholas W. D. Evans,et al.  Speaker Diarization: A Review of Recent Research , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Javier Lorenzo-Navarro,et al.  Multi-scale score level fusion of local descriptors for gender classification in the wild , 2016, Multimedia Tools and Applications.

[8]  Sylvain Meignier,et al.  LIUM SPKDIARIZATION: AN OPEN SOURCE TOOLKIT FOR DIARIZATION , 2010 .

[9]  Andrew Zisserman,et al.  Taking the bite out of automated naming of characters in TV video , 2009, Image Vis. Comput..

[10]  Javier Lorenzo-Navarro,et al.  Shot Classification and Keyframe Detection for Vision Based Speakers Diarization in Parliamentary Debates , 2016, CAEPIA.

[11]  Louahdi Khoudour,et al.  People re-identification by spectral classification of silhouettes , 2010, Signal Process..

[12]  Marie Kunesová,et al.  Audio-Video Speaker Diarization for Unsupervised Speaker and Face Model Creation , 2014, TSD.