Source phone identification using sketches of features

Speech recordings carry useful information for the devices used to capture them. Here, acquisition device identification is studied using ‘sketches of features’ as intrinsic device characteristics. That is, starting from large-size raw feature vectors obtained by either averaging the log-spectrogram of a speech recording along the time axis or stacking the parameters of each component for a Gaussian mixture model modelling the speech recorded by a specific device, features of reduced size are extracted by mapping these raw feature vectors into a low-dimensional space. The mapping preserves the ‘distance properties’ of the raw feature vectors. It is obtained by taking the inner product of the raw feature vector with a vector of independent identically distributed random variables drawn from a p-stable distribution. State-of-the art classifiers, such as a sparse representation-based classifier or support vector machines, applied to the sketches yield an identification accuracy exceeding 94% on a set of eight landline telephone handsets from Lincoln-Labs Handset Database. Perfect identification is reported for a set of 21 cell-phones of various models from seven different brands.

[1]  R. Maher,et al.  Audio forensic examination , 2009, IEEE Signal Processing Magazine.

[2]  Mike Brookes,et al.  Non intrusive codec identification algorithm , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Jiwu Huang,et al.  Detecting digital audio forgeries by checking frame offsets , 2008, MM&Sec '08.

[4]  Rui Yang,et al.  Compression history identification for digital audio signal , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Constantine Kotropoulos Telephone handset identification using sparse representations of spectral feature sketches , 2013, 2013 International Workshop on Biometrics and Forensics (IWBF).

[6]  Sun-Yuan Kung,et al.  Combining stochastic feature transformation and handset identification for telephone-based speaker verification , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  J. Fridrich,et al.  Digital image forensics , 2009, IEEE Signal Processing Magazine.

[9]  Isabelle Guyon,et al.  What Size Test Set Gives Good Error Rate Estimates? , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Daniel Garcia-Romero,et al.  Automatic Speech Codec Identification with Applications to Tampering Detection of Speech Recordings , 2011, INTERSPEECH.

[11]  D. Donoho For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solution , 2006 .

[12]  Constantine Kotropoulos,et al.  Automatic telephone handset identification by sparse representation of random spectral features , 2012, MM&Sec '12.

[13]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[14]  Gonzalo R. Arce,et al.  Generalized Restricted Isometry Property for alpha-stable random projections , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Constantine Kotropoulos,et al.  Telephone handset identification by feature selection and sparse representations , 2012, 2012 IEEE International Workshop on Information Forensics and Security (WIFS).

[16]  Hafiz Malik,et al.  Acoustic Environment Identification and Its Applications to Audio Forensics , 2013, IEEE Transactions on Information Forensics and Security.

[17]  Andres Kwasinski,et al.  Highly accurate non-intrusive speech forensics for codec identifications from observed decoded signals , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Daniel Garcia-Romero,et al.  Automatic acquisition device identification from speech recordings , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Tieniu Tan,et al.  Half-Quadratic-Based Iterative Minimization for Robust Sparse Representation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Julien Epps,et al.  A study of automatic phonetic segmentation for forensic voice comparison , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Jana Dittmann,et al.  Unweighted fusion in microphone forensics using a decision tree and linear logistic regression models , 2009, MM&Sec '09.

[22]  Douglas A. Reynolds,et al.  HTIMIT and LLHDB: speech corpora for the study of handset transducer effects , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[24]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[25]  Jana Dittmann,et al.  Digital audio forensics: a first practical evaluation on microphone and environment classification , 2007, MM&Sec.

[26]  D. Psaltis,et al.  Nonlinear signal processing , 2005, 2005 IEEE LEOS Annual Meeting Conference Proceedings.

[27]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[28]  Jana Dittmann,et al.  Verifier-tuple for audio-forensic to determine speaker environment , 2005, MM&Sec '05.

[29]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[30]  Cemal Hanilçi,et al.  Recognition of Brand and Models of Cell-Phones From Recorded Speech Signals , 2012, IEEE Transactions on Information Forensics and Security.

[31]  Hong Zhao,et al.  Audio Recording Location Identification Using Acoustic Environment Signature , 2013, IEEE Transactions on Information Forensics and Security.

[32]  Hany Farid,et al.  Audio forensics from acoustic reverberation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .