Audio splicing detection and localization using environmental signature

Audio splicing is one of the most common manipulation techniques in the area of audio forensics. In this paper, the magnitudes of acoustic channel impulse response and ambient noise are proposed as the environmental signature. Specifically, the spliced audio segments are detected according to the magnitude correlation between the query frames and reference frames via a statically optimal threshold. The detection accuracy is further refined by comparing the adjacent frames. The effectiveness of the proposed method is tested on two data sets. One is generated from TIMIT database, the second is made in four acoustic environments using a commercial grade microphones. Experimental results show that the proposed method not only detects the presence of spliced frames, but also localizes the forgery segments with near perfect accuracy. Comparison results illustrate that the identification accuracy of the proposed scheme is higher than the previous schemes. In addition, experimental results also show that the proposed scheme is also superior to the previous works. A real-world meeting recording database (AMI corpus) is also used to verify the effectiveness of the proposed method for practical applications.

[1]  Hong Zhao,et al.  Audio forensics using acoustic environment traces , 2012, 2012 IEEE Statistical Signal Processing Workshop (SSP).

[2]  Hafiz Malik,et al.  Acoustic Environment Identification and Its Applications to Audio Forensics , 2013, IEEE Transactions on Information Forensics and Security.

[3]  Hong Zhao,et al.  Recording environment identification using acoustic reverberation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Rafal Korycki,et al.  Time and spectral analysis methods with machine learning for the authentication of digital audio recordings. , 2013, Forensic science international.

[5]  Gilbert A. Soulodre,et al.  About This Dereverberation Business: A Method for Extracting Reverberation from Audio Signals , 2010 .

[6]  Asaf Cohen,et al.  Electrical Network Frequency (ENF) Maximum-Likelihood Estimation Via a Multitone Harmonic Model , 2013, IEEE Transactions on Information Forensics and Security.

[7]  Daniel Garcia-Romero,et al.  Speech forensics: Automatic acquisition device identification. , 2010 .

[8]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Jana Dittmann,et al.  Unweighted fusion in microphone forensics using a decision tree and linear logistic regression models , 2009, MM&Sec '09.

[10]  Durand R. Begault,et al.  Tape Analysis and Authentication using Multi-Track Recorders , 2005 .

[11]  Jana Dittmann,et al.  Extending a context model for microphone forensics , 2012, Other Conferences.

[12]  Constantine Kotropoulos,et al.  Automatic telephone handset identification by sparse representation of random spectral features , 2012, MM&Sec '12.

[13]  E. Lehmann,et al.  Prediction of energy decay in room impulse responses simulated with an image-source model. , 2008, The Journal of the Acoustical Society of America.

[14]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[15]  Eddy B. Brixen Techniques for the Authentication of Digital Audio Recordings , 2007 .

[16]  Eddy B. Brixen Acoustics of the Crime Scene as Transmitted by Mobile Phones , 2009 .

[17]  Alan J. Cooper Detecting Butt-Spliced Edits in Forensic Digital Audio Recordings , 2010 .

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Hong Zhao,et al.  Audio Recording Location Identification Using Acoustic Environment Signature , 2013, IEEE Transactions on Information Forensics and Security.

[20]  Roland Maas,et al.  Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Eric A. Lehmann,et al.  Diffuse Reverberation Model for Efficient Image-Source Simulation of Room Impulse Responses , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Alexander H. Waibel,et al.  Classifying user environment for mobile applications using linear autoencoding of ambient audio , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[23]  Adi Hajj-Ahmad,et al.  ENF analysis on recaptured audio recordings , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Mike Brookes,et al.  Blind Channel Magnitude Response Estimation in Speech Using Spectrum Classification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Jana Dittmann,et al.  Digital audio forensics: a first practical evaluation on microphone and environment classification , 2007, MM&Sec.

[26]  Yilu Liu,et al.  Source of ENF in Battery-Powered Digital Recordings , 2013 .

[27]  S. Nadarajah,et al.  Extreme value distributions , 2013 .

[28]  Bengt J. Borgstrom,et al.  The linear prediction inverse modulation transfer function (LP-IMTF) filter for spectral enhancement, with applications to speaker recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Jiwu Huang,et al.  Detecting digital audio forgeries by checking frame offsets , 2008, MM&Sec '08.

[30]  Qingzhong Liu,et al.  Improved detection of MP3 double compression using content-independent features , 2013, 2013 IEEE International Conference on Signal Processing, Communication and Computing (ICSPCC 2013).

[31]  Constantine Kotropoulos,et al.  Telephone handset identification by feature selection and sparse representations , 2012, 2012 IEEE International Workshop on Information Forensics and Security (WIFS).

[32]  J. A. Domínguez-Molina A practical procedure to estimate the shape parameter in the generalized Gaussian distribution , 2002 .

[33]  Hafiz Malik Securing Speaker Verification System Against Replay Attack , 2012 .

[34]  Min Wu,et al.  Spectrum Combining for ENF Signal Estimation , 2013, IEEE Signal Processing Letters.

[35]  Qingzhong Liu,et al.  Detection of Double MP3 Compression , 2010, Cognitive Computation.

[36]  Qingzhong Liu,et al.  Revealing real quality of double compressed MP3 audio , 2010, ACM Multimedia.

[37]  Hafiz Malik,et al.  Microphone Identification Using Higher-Order Statistics , 2012 .

[38]  Biing-Hwang Juang,et al.  Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Adi Hajj-Ahmad,et al.  Geo-location estimation from Electrical Network Frequency signals , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[40]  Shijun Xiang,et al.  Exposing digital audio forgeries in time domain by using singularity analysis with wavelets , 2013, IH&MMSec '13.

[41]  Jana Dittmann,et al.  A context model for microphone forensics and its application in evaluations , 2011, Electronic Imaging.

[42]  Rui Yang,et al.  Defeating fake-quality MP3 , 2009, MM&Sec '09.

[43]  Matthias Wölfel,et al.  Enhanced Speech Features by Single-Channel Joint Compensation of Noise and Reverberation , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[44]  C.-C. Jay Kuo,et al.  Where am I? Scene Recognition for Mobile Robots using Audio Features , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[45]  Xing Zhang,et al.  Detecting splicing in digital audios using local noise level estimation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[46]  Jana Dittmann,et al.  Verifier-tuple for audio-forensic to determine speaker environment , 2005, MM&Sec '05.

[47]  Hany Farid,et al.  Audio forensics from acoustic reverberation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[48]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[49]  Catalin Grigoras Statistical Tools for Multimedia Forensics , 2010 .

[50]  Alan J. Cooper The Electric Network Frequency (ENF) as an Aid to Authenticating Forensic Digital Audio Recordings – an Automated Approach , 2008 .

[51]  Hafiz Malik,et al.  Digital audio forensics using background noise , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[52]  Dagmar Boss Visualization of Magnetic Features on Analogue Audiotapes Is Still an Important Task , 2010 .

[53]  Jana Dittmann,et al.  Microphone Classification Using Fourier Coefficients , 2009, Information Hiding.

[54]  Bruce E. Koenig,et al.  Forensic Enhancement of Digital Audio Recordings , 2007 .

[55]  Marc Moonen,et al.  Combined frequency-domain dereverberation and noise reduction technique for multi-microphone speech enhancement , 2001 .

[56]  Daniel Patricio Nicolalde Rodríguez,et al.  Audio Authenticity: Detecting ENF Discontinuity With High Precision Phase Analysis , 2010, IEEE Transactions on Information Forensics and Security.

[57]  Rui Yang,et al.  Detecting double compression of audio signal , 2010, Electronic Imaging.