Iterative PLDA Adaptation for Speaker Diarization

This paper investigates iterative PLDA adaptation for cross-show speaker diarization applied to small collections of French TV archives based on an i-vector framework. Using the target collection itself for unsupervised adaptation, PLDA parameters are iteratively tuned while score normalization is applied for convergence. Performances are compared, using combinations of target and external data for training and adaptation. The experiments on two distinct target corpora show that the proposed framework can gradually improve an existing system trained on external annotated data. Such results indicate that performing speaker diarization on small collections of unlabeled audio archives should only rely on the availability of a sufficient boot-strap system, which can be incrementally adapted to every target collection. The proposed framework also widens the range of acceptable speaker clustering thresholds for a given performance objective.

[1]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[2]  Ming Li,et al.  An iterative framework for unsupervised learning in the PLDA based speaker verification , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[3]  Delphine Charlet,et al.  Speaker diarization with unsupervised training framework , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Delphine Charlet,et al.  First investigations on self trained speaker diarization , 2016, Odyssey.

[5]  Alan McCree,et al.  Supervised domain adaptation for I-vector based speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Tanja Schultz,et al.  Investigation of Cross-Show Speaker Diarization , 2011, INTERSPEECH.

[7]  Sridha Sridharan,et al.  Speaker attribution of multiple telephone conversations using a complete-linkage clustering approach , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Marc Ferras,et al.  Speaker diarization and linking of large corpora , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[9]  Niko Brümmer,et al.  Unsupervised Domain Adaptation for I-Vector Speaker Recognition , 2014, Odyssey.

[10]  Sylvain Meignier,et al.  Is incremental cross-show speaker diarization efficient for processing large volumes of data? , 2014, INTERSPEECH.

[11]  Olivier Galibert,et al.  The ETAPE speech processing evaluation , 2014, LREC.

[12]  Guillaume Gravier,et al.  The ester 2 evaluation campaign for the rich transcription of French radio broadcasts , 2009, INTERSPEECH.

[13]  William M. Campbell,et al.  Graph-embedding for speaker recognition , 2010, INTERSPEECH.

[14]  William M. Campbell,et al.  Large-scale community detection on speaker content graphs , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Kong-Aik Lee,et al.  An extensible speaker identification sidekit in Python , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  David A. van Leeuwen Speaker linking in large data sets , 2010, Odyssey.

[17]  Driss Matrouf,et al.  Variance-spectra based normalization for i-vector standard and probabilistic linear discriminant analysis , 2012, Odyssey.

[18]  Olivier Galibert,et al.  The First Official REPERE Evaluation , 2013, SLAM@INTERSPEECH.

[19]  Olivier Galibert,et al.  Methodologies for the evaluation of speaker diarization and automatic speech recognition in the presence of overlapping speech , 2013, INTERSPEECH.

[20]  Douglas A. Reynolds,et al.  Unsupervised Clustering Approaches for Domain Adaptation in Speaker Recognition Systems , 2014, Odyssey.

[21]  Lori Lamel,et al.  Comparing Multi-Stage Approaches for Cross-Show Speaker Diarization , 2011, INTERSPEECH.