Audio Recording Location Identification Using Acoustic Environment Signature

An audio recording is subject to a number of possible distortions and artifacts. Consider, for example, artifacts due to acoustic reverberation and background noise. The acoustic reverberation depends on the shape and the composition of the room, and it causes temporal and spectral smearing of the recorded sound. The background noise, on the other hand, depends on the secondary audio source activities present in the evidentiary recording. Extraction of acoustic cues from an audio recording is an important but challenging task. Temporal changes in the estimated reverberation and background noise can be used for dynamic acoustic environment identification (AEI), audio forensics, and ballistic settings. We describe a statistical technique based on spectral subtraction to estimate the amount of reverberation and nonlinear filtering based on particle filtering to estimate the background noise. The effectiveness of the proposed method is tested using a data set consisting of speech recordings of two human speakers (one male and one female) made in eight acoustic environments using four commercial grade microphones. Performance of the proposed method is evaluated for various experimental settings such as microphone independent, semi- and full-blind AEI, and robustness to MP3 compression. Performance of the proposed framework is also evaluated using Temporal Derivative-based Spectrum and Mel-Cepstrum (TDSM)-based features. Experimental results show that the proposed method improves AEI performance compared with the direct method (i.e., feature vector is extracted from the audio recording directly). In addition, experimental results also show that the proposed scheme is robust to MP3 compression attack.

[1]  John McDonough,et al.  Distant Speech Recognition , 2009 .

[2]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[3]  S. Nakamura,et al.  Sequential Noise Compensation by Sequential Monte Carlo Method , 2001, NIPS.

[4]  John H. L. Hansen,et al.  VOICE ANALYSIS IN ADVERSE CONDITIONS: THE CENTENNIAL OLYMPIC PARK BOMBING 911 CALL , 1999 .

[5]  Qingzhong Liu,et al.  Revealing real quality of double compressed MP3 audio , 2010, ACM Multimedia.

[6]  Harry Hollien,et al.  The Acoustics of Crime: The New Science of Forensic Phonetics , 1990 .

[7]  Bruce E. Koenig,et al.  Forensic Enhancement of Digital Audio Recordings , 2007 .

[8]  Qingzhong Liu,et al.  Temporal Derivative-Based Spectrum and Mel-Cepstrum Audio Steganalysis , 2009, IEEE Transactions on Information Forensics and Security.

[9]  Rui Yang,et al.  Defeating fake-quality MP3 , 2009, MM&Sec '09.

[10]  Qingzhong Liu,et al.  Novel stream mining for audio steganalysis , 2009, MM '09.

[11]  Branko Ristic,et al.  Beyond the Kalman Filter: Particle Filters for Tracking Applications , 2004 .

[12]  Catalin Grigoras Statistical Tools for Multimedia Forensics , 2010 .

[13]  Alan J. Cooper The Electric Network Frequency (ENF) as an Aid to Authenticating Forensic Digital Audio Recordings – an Automated Approach , 2008 .

[14]  Hafiz Malik,et al.  Digital audio forensics using background noise , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[15]  Dagmar Boss Visualization of Magnetic Features on Analogue Audiotapes Is Still an Important Task , 2010 .

[16]  Daniel Patricio Nicolalde Rodríguez,et al.  Audio Authenticity: Detecting ENF Discontinuity With High Precision Phase Analysis , 2010, IEEE Transactions on Information Forensics and Security.

[17]  Rui Yang,et al.  Detecting double compression of audio signal , 2010, Electronic Imaging.

[18]  R.C. Maher,et al.  Modeling and Signal Processing of Acoustic Gunshot Recordings , 2006, 2006 IEEE 12th Digital Signal Processing Workshop & 4th IEEE Signal Processing Education Workshop.

[19]  Hafiz Malik Securing Speaker Verification System Against Replay Attack , 2012 .

[20]  Jana Dittmann,et al.  A context model for microphone forensics and its application in evaluations , 2011, Electronic Imaging.

[21]  Hong Zhao,et al.  Audio forensics using acoustic environment traces , 2012, 2012 IEEE Statistical Signal Processing Workshop (SSP).

[22]  Harry Hollien,et al.  Forensic Voice Identification , 2001 .

[23]  Catalin Grigoras Applications of ENF Analysis in Forensic Authentication of Digital Audio and Video Recordings , 2007 .

[24]  Qingzhong Liu,et al.  Derivative-based audio steganalysis , 2011, TOMCCAP.

[25]  Eddy B. Brixen Techniques for the Authentication of Digital Audio Recordings , 2007 .

[26]  Daniel Patricio Nicolalde Rodríguez,et al.  Evaluating digital audio authenticity with spectral distances and ENF phase change , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Catalin Grigoras Applications of ENF criterion in forensic audio, video, computer and telecommunication analysis. , 2007, Forensic science international.

[28]  Durand R. Begault,et al.  Tape Analysis and Authentication using Multi-Track Recorders , 2005 .

[29]  Constantine Kotropoulos,et al.  Automatic telephone handset identification by sparse representation of random spectral features , 2012, MM&Sec '12.

[30]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Douglas L. Jones,et al.  Blind estimation of reverberation time. , 2003, The Journal of the Acoustical Society of America.

[32]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[33]  Robert C. Maher,et al.  Acoustical Characterization of Gunshots , 2007 .

[34]  Hong Zhao,et al.  Recording environment identification using acoustic reverberation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[35]  Jana Dittmann,et al.  Unweighted fusion in microphone forensics using a decision tree and linear logistic regression models , 2009, MM&Sec '09.

[36]  Jana Dittmann,et al.  Extending a context model for microphone forensics , 2012, Other Conferences.

[37]  Siwei Lyu,et al.  Detecting Hidden Messages Using Higher-Order Statistics and Support Vector Machines , 2002, Information Hiding.

[38]  Andrew D. Ker,et al.  Steganalysis using logistic regression , 2011, Electronic Imaging.

[39]  Tao Li,et al.  Using discriminant analysis for multi-class classification: an experimental investigation , 2006, Knowledge and Information Systems.

[40]  Scott Axelrod,et al.  Subspace constrained Gaussian mixture models for speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[41]  Daniel Garcia-Romero,et al.  Speech forensics: Automatic acquisition device identification. , 2010 .

[42]  Alan J. Cooper Detecting Butt-Spliced Edits in Forensic Digital Audio Recordings , 2010 .

[43]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[44]  S. Nakamura,et al.  Particle filtering and Polyak averaging-based non-stationary noise tracking for ASR in noise , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[45]  Alexander H. Waibel,et al.  Classifying user environment for mobile applications using linear autoencoding of ambient audio , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[46]  Siwei Lyu,et al.  Steganalysis using higher-order image statistics , 2006, IEEE Transactions on Information Forensics and Security.

[47]  Gilbert A. Soulodre,et al.  About This Dereverberation Business: A Method for Extracting Reverberation from Audio Signals , 2010 .

[48]  Hafiz Malik,et al.  Microphone Identification Using Higher-Order Statistics , 2012 .

[49]  Daniel Garcia-Romero,et al.  Automatic acquisition device identification from speech recordings , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[50]  Jana Dittmann,et al.  Mel-cepstrum-based steganalysis for VoIP steganography , 2007, Electronic Imaging.

[51]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[52]  Satoshi Nakamura,et al.  Noise adaptive speech recognition based on sequential noise parameter estimation , 2004, Speech Commun..

[53]  Friedrich Faubel,et al.  Coupling particle filters with automatic speech recognition for speech feature enhancement , 2006, INTERSPEECH.

[54]  Daniel Garcia-Romero,et al.  Automatic Speech Codec Identification with Applications to Tampering Detection of Speech Recordings , 2011, INTERSPEECH.

[55]  David Hallimore,et al.  SWGDE Best Practices for Forensic Audio , 2008 .

[56]  Hany Farid,et al.  Audio forensics from acoustic reverberation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[57]  Jana Dittmann,et al.  Microphone Classification Using Fourier Coefficients , 2009, Information Hiding.

[58]  Matthias Wölfel,et al.  Enhanced Speech Features by Single-Channel Joint Compensation of Noise and Reverberation , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[59]  C.-C. Jay Kuo,et al.  Where am I? Scene Recognition for Mobile Robots using Audio Features , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[60]  Xing Zhang,et al.  Detecting splicing in digital audios using local noise level estimation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[61]  Jana Dittmann,et al.  Verifier-tuple for audio-forensic to determine speaker environment , 2005, MM&Sec '05.

[62]  Jiwu Huang,et al.  Detecting digital audio forgeries by checking frame offsets , 2008, MM&Sec '08.

[63]  Bhiksha Raj,et al.  Tracking noise via dynamical systems with a continuum of states , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[64]  Eddy B. Brixen Acoustics of the Crime Scene as Transmitted by Mobile Phones , 2009 .

[65]  Eddy B. Brixen ENF; Quantification of the Magnetic Field , 2008 .

[66]  Jana Dittmann,et al.  Digital audio forensics: a first practical evaluation on microphone and environment classification , 2007, MM&Sec.

[67]  Catalin Grigoras Digital audio recording analysis: the Electric Network Frequency (ENF) Criterion , 2005 .

[68]  Qingzhong Liu,et al.  Detection of Double MP3 Compression , 2010, Cognitive Computation.