Mobile Phone Clustering From Speech Recordings Using Deep Representation and Spectral Clustering

Considerable attention has been paid to acquisition device recognition over the past decade in the forensic community, especially in digital image forensics. In contrast, acquisition device clustering from speech recordings is a new problem that aims to merge the recordings acquired by the same device into a single cluster without having prior information about the recordings and training classifiers in advance. In this paper, we propose a method for mobile phone clustering from speech recordings by using a new feature of deep representation and a spectral clustering algorithm. The new feature is learned by a deep auto-encoder network for representing the intrinsic trace left behind by each phone in the recordings, and spectral clustering is used to merge recordings acquired by the same phone into a single cluster. The impacts of the structures of the deep auto-encoder network on the performance of the new feature are discussed. Different features are compared with one another. The proposed method is compared with others and evaluated under special conditions. The results show that the proposed method is effective under these conditions and the new feature outperforms other features.

[1]  Yanxiong Li,et al.  Source cell phone matching from speech recordings by sparse representation and KISS metric , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Hong Zhao,et al.  Audio Recording Location Identification Using Acoustic Environment Signature , 2013, IEEE Transactions on Information Forensics and Security.

[3]  Roberto Caldelli,et al.  Smartphone Fingerprinting Combining Features of On-Board Sensors , 2017, IEEE Transactions on Information Forensics and Security.

[4]  Ömer Eskidere,et al.  Source microphone identification from speech recordings based on a Gaussian mixture model , 2014 .

[5]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[6]  Xue Zhang,et al.  Mobile phone clustering from acquired speech recordings using deep Gaussian supervector and spectral clustering , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Qian Huang,et al.  Using multi-stream hierarchical deep neural network to extract deep audio feature for acoustic event detection , 2016, Multimedia Tools and Applications.

[8]  D.P. Skinner,et al.  The cepstrum: A guide to processing , 1977, Proceedings of the IEEE.

[9]  Jana Dittmann,et al.  Unweighted fusion in microphone forensics using a decision tree and linear logistic regression models , 2009, MM&Sec '09.

[10]  Nicholas W. D. Evans,et al.  Speaker Diarization: A Review of Recent Research , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Constantine Kotropoulos,et al.  Mobile phone identification using recorded speech signals , 2014, 2014 19th International Conference on Digital Signal Processing.

[12]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[13]  Jana Dittmann,et al.  Digital audio forensics: a first practical evaluation on microphone and environment classification , 2007, MM&Sec.

[14]  Ken-ichi Iso Speaker clustering using vector quantization and spectral clustering , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Constantine Kotropoulos,et al.  Automatic telephone handset identification by sparse representation of random spectral features , 2012, MM&Sec '12.

[16]  Loren Enochson,et al.  PROGRAMMING AND ANALYSIS FOR DIGITAL TIME SERIES DATA , 1968 .

[17]  Patrick Aichroth,et al.  Open-set microphone classification via blind channel analysis , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Min Wu,et al.  Information Forensics: An Overview of the First Decade , 2013, IEEE Access.

[19]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[20]  Dong Yu,et al.  Improved Bottleneck Features Using Pretrained Deep Neural Networks , 2011, INTERSPEECH.

[21]  Daniel Garcia-Romero,et al.  Speech forensics: Automatic acquisition device identification. , 2010 .

[22]  Xiaohui Feng,et al.  Cell phone verification from speech recordings using sparse representation , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Ömer Eskidere,et al.  Identifying acquisition devices from recorded speech signals using wavelet-based features , 2016 .

[24]  Junfeng Wu,et al.  Source cell phone verification from speech recordings using sparse representation , 2017, Digit. Signal Process..

[25]  Dong Yu,et al.  Automatic Speech Recognition: A Deep Learning Approach , 2014 .

[26]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[27]  Jana Dittmann,et al.  A context model for microphone forensics and its application in evaluations , 2011, Electronic Imaging.

[28]  Constantine Kotropoulos,et al.  Telephone handset identification by feature selection and sparse representations , 2012, 2012 IEEE International Workshop on Information Forensics and Security (WIFS).

[29]  Daniel Garcia-Romero,et al.  Automatic acquisition device identification from speech recordings , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Hong Zhao,et al.  Recording environment identification using acoustic reverberation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Constantine Kotropoulos,et al.  Source phone identification using sketches of features , 2014, IET Biom..

[32]  Constantine Kotropoulos Telephone handset identification using sparse representations of spectral feature sketches , 2013, 2013 International Workshop on Biometrics and Forensics (IWBF).

[33]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[34]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[35]  Tomi Kinnunen,et al.  Source cell-phone recognition from recorded speech using non-speech segments , 2014, Digit. Signal Process..

[36]  Hafiz Malik,et al.  Microphone Identification Using Higher-Order Statistics , 2012 .

[37]  Jana Dittmann,et al.  Microphone Classification Using Fourier Coefficients , 2009, Information Hiding.

[38]  Ainuddin Wahid Abdul Wahab,et al.  Blind source mobile device identification based on recorded call , 2014, Eng. Appl. Artif. Intell..

[39]  Cemal Hanilçi,et al.  Recognition of Brand and Models of Cell-Phones From Recorded Speech Signals , 2012, IEEE Transactions on Information Forensics and Security.

[40]  Douglas A. Reynolds,et al.  HTIMIT and LLHDB: speech corpora for the study of handset transducer effects , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[42]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.