Improving Speaker Identification using Network Knowledge in Criminal Conversational Data

Criminal investigations rely on the collection of conversational data. The identity of speakers must be assessed in order to build or improve the accuracy of an existing criminal network. Investigators use social network analysis tools to identify the most central character and the different communities within the network. We introduce Crime Scene Investigation (CSI) television show as a potential candidate for criminal conversational data. We also introduce the metric of conversation accuracy in the context of criminal investigations. In this paper, a speaker identification baseline is improved by re-ranking candidate speakers based on the frequency of previous interactions between speakers and the topology of the criminal network. The proposed method can be applied to conversations involving two or more speakers. We show that our approach outperforms the baseline speaker accuracy by 1.3% absolute (1.5% relative), and the conversation accuracy by 3.7% absolute (4.7% relative) on CSI data.

[1]  Ladislav Mošner,et al.  Building and Evaluation of a Real Room Impulse Response Dataset , 2018, IEEE Journal of Selected Topics in Signal Processing.

[2]  Douglas W. Oard,et al.  Leveraging side information for speaker identification with the Enron conversational telephone speech collection , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[3]  Pasquale De Meo,et al.  Disrupting resilient criminal networks through data analysis: The case of Sicilian Mafia , 2020, PloS one.

[4]  Joon Son Chung,et al.  VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.

[5]  Sanjeev Khudanpur,et al.  A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.

[6]  Elliot Singer,et al.  The 2019 NIST Audio-Visual Speaker Recognition Evaluation , 2020 .

[7]  Sergey Ioffe,et al.  Probabilistic Linear Discriminant Analysis , 2006, ECCV.

[8]  Geoffrey Stewart Morrison,et al.  INTERPOL survey of the use of speaker identification by law enforcement agencies. , 2016, Forensic science international.

[9]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Hervé Bredin,et al.  SEGMENTING TV SERIES INTO SCENES USING SPEAKER DIARIZATION , 2010 .

[11]  Azhari Sn,et al.  Central Actor Identification of Crime Group using Semantic Social Network Analysis , 2019, Indonesian Journal of Information Systems.

[12]  Ivan Himawan,et al.  IDIAP SUBMISSION TO THE NIST SRE 2016 SPEAKER RECOGNITION EVALUATION , 2016 .

[13]  Sanjeev Khudanpur,et al.  Speaker Recognition for Multi-speaker Conversations Using X-vectors , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Christine Sénac,et al.  Segmentation TV series into scenes using speaker diarization , 2011, WIAMIS 2011.

[15]  Pasquale De Meo,et al.  Robust link prediction in criminal networks: A case study of the Sicilian Mafia , 2020, Expert Syst. Appl..

[16]  Sanjeev Khudanpur,et al.  X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[18]  Petr Motlícek,et al.  A Bayesian Approach to Inter-task Fusion for Speaker Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Mirella Lapata,et al.  Whodunnit? Crime Drama as a Case for Natural Language Understanding , 2018, Transactions of the Association for Computational Linguistics.

[20]  Daniel Povey,et al.  MUSAN: A Music, Speech, and Noise Corpus , 2015, ArXiv.