Similarity Visualization for the Grouping of Forensic Speech Recordings

In a forensic phone wiretapping investigation, a major problem is to get the full picture of the speakers involved. Typically, the wiretapped speech recordings are grouped using a clustering tool. The main disadvantage of such an approach is that in a bootstrapped scenario grouping errors accumulate. In this paper, we propose a visual approach to find similar speech recordings that probably stem from the same speaker. We first model the speech recordings and define suitable similarity measures between recordings. Then, through an approximate 2-D visualization of the inter-speech, similarities the investigator can identify clear groups of recordings and recordings that are harder to differentiate. We did extensive experiments on phone data of 50 speakers with 2 recordings per speaker. We tested quality of the 2-D visualization in relation to original high dimensional similarities. It turned out that for the original high dimensional similarity measure the nearest recording is almost always the one from the same speaker. In the 2-D visualization, we achieved that on average for all speech recordings a recording of the same speaker is among the 10 nearest recordings.

[1]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .

[2]  R. Cole,et al.  Survey of the State of the Art in Human Language Technology , 2010 .

[3]  Andrzej Drygajlo,et al.  Aural and automatic forensic speaker recognition in mismatched conditions , 2005 .

[4]  Alvin F. Martin,et al.  NIST Speaker Recognition Evaluation Chronicles - Part 2 , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[5]  Myung-Jin Bae,et al.  A study on the improvement of speaker recognition system by voiced detection , 2002, The 2002 45th Midwest Symposium on Circuits and Systems, 2002. MWSCAS-2002..

[6]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[7]  D. O'Shaughnessy,et al.  Speaker recognition , 1986, IEEE ASSP Magazine.

[8]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[9]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[10]  Alvin F. Martin,et al.  The NIST Speaker Recognition Evaluations: 1996-2001 , 1998, Odyssey.

[11]  Jean Rouat,et al.  Speaker identification by computer and human evaluated on the SPIDRE corpus , 2000 .

[12]  Javier Ortega-Garcia,et al.  Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition , 2006, Comput. Speech Lang..

[13]  Jean-Luc Gauvain,et al.  Feature and score normalization for speaker verification of cellular data , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[14]  Douglas A. Reynolds,et al.  An overview of automatic speaker recognition technology , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Stanley J. Wenndt,et al.  Spectral autocorrelation ratio as a usability measure of speech segments under co-channel conditions , 2000 .

[16]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[17]  Alvin F. Martin,et al.  NIST speaker recognition evaluation chronicles , 2004, Odyssey.

[18]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[19]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[20]  Thomas H. Crystal,et al.  Human vs. machine speaker identification with telephone speech , 1998, ICSLP.

[21]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[22]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.