Telephone handset identification using sparse representations of spectral feature sketches

Speech signals convey useful information for the recording devices used to capture them. Here, acquisition device identification is studied using the sketches of spectral features (SSFs) as intrinsic fingerprints. The SSFs are extracted from the speech signal by first averaging its spectrogram along the time axis and then by mapping the resulting mean spectrogram into a low-dimension space, such that the “distance properties” of the high-dimensional mean spectrograms are preserved. Such a mapping results by taking the inner product of the mean spectrogram with a vector of independent identically distributed random variables drawn from a p-stable distribution. By applying a sparse-representation based classifier to the SSFs, state-of-the-art identification accuracy exceeding 95% has been measured on a set of 8 telephone handsets from Lincoln-Labs Handset Database (LLHDB).

[1]  R. Maher,et al.  Audio forensic examination , 2009, IEEE Signal Processing Magazine.

[2]  Isabelle Guyon,et al.  What Size Test Set Gives Good Error Rate Estimates? , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Jiwu Huang,et al.  Detecting digital audio forgeries by checking frame offsets , 2008, MM&Sec '08.

[5]  Hany Farid,et al.  Digital image forensics. , 2008, Scientific American.

[6]  Daniel Garcia-Romero,et al.  Automatic Speech Codec Identification with Applications to Tampering Detection of Speech Recordings , 2011, INTERSPEECH.

[7]  Rui Yang,et al.  Compression history identification for digital audio signal , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  V. Zolotarev One-dimensional stable distributions , 1986 .

[11]  Jana Dittmann,et al.  Verifier-tuple for audio-forensic to determine speaker environment , 2005, MM&Sec '05.

[12]  D. Donoho For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solution , 2006 .

[13]  Hany Farid,et al.  Audio forensics from acoustic reverberation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Constantine Kotropoulos,et al.  Automatic telephone handset identification by sparse representation of random spectral features , 2012, MM&Sec '12.

[15]  Andres Kwasinski,et al.  Highly accurate non-intrusive speech forensics for codec identifications from observed decoded signals , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Julien Epps,et al.  A study of automatic phonetic segmentation for forensic voice comparison , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Jana Dittmann,et al.  Unweighted fusion in microphone forensics using a decision tree and linear logistic regression models , 2009, MM&Sec '09.

[18]  Daniel Garcia-Romero,et al.  Automatic acquisition device identification from speech recordings , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Douglas A. Reynolds,et al.  HTIMIT and LLHDB: speech corpora for the study of handset transducer effects , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Cemal Hanilçi,et al.  Recognition of Brand and Models of Cell-Phones From Recorded Speech Signals , 2012, IEEE Transactions on Information Forensics and Security.

[21]  Jana Dittmann,et al.  Digital audio forensics: a first practical evaluation on microphone and environment classification , 2007, MM&Sec.