RARS: Recognition of Audio Recording Source Based on Residual Neural Network

With the popularity of mobile devices and the emergence of various audio-editing tools, it becomes easier to produce and forge audio files. Many criminals will forge false audio information as evidence. Therefore, audio forensics technology becomes particularly important. Audio recording device identification technology, which can verify the authenticity and uniqueness of the evidence obtained, is one of the promising branches of audio forensics technology. In this article, a novel neural-network-based framework using the device noise feature is proposed to identify the source of recording according to the device traces generated by the device during the recording. We also propose a new neural network model RARS (Recognition of Audio Recording Source based on residual neural network). The proposed framework achieves state-of-the-art performance on MOBIPHONE, the only publicly available dataset in this field. Moreover, we build a new dataset based on the latest mobile phones and tablet devices. Our method achieves good performance on both the two datasets, which proves that our model has a certain degree of reusability and robustness.

[1]  Jiwu Huang,et al.  Audio recapture detection using deep learning , 2015, 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP).

[2]  Shugong Xu,et al.  Two-stage Training for Chinese Dialect Recognition , 2019, INTERSPEECH.

[3]  Yuechi Jiang,et al.  Mobile phone identification from speech recordings using Weighted Support Vector Machine , 2016, IECON 2016 - 42nd Annual Conference of the IEEE Industrial Electronics Society.

[4]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[5]  Zheng Huang,et al.  Audio recording device identification based on deep learning , 2016, 2016 IEEE International Conference on Signal and Image Processing (ICSIP).

[6]  Vinay Verma,et al.  CNN-based System for Speaker Independent Cell-Phone Identification from Recorded Audio , 2019, CVPR Workshops.

[7]  Cemal Hanilçi,et al.  Optimizing acoustic features for source cell-phone recognition using speech signals , 2013, IH&MMSec '13.

[8]  Vinay Verma,et al.  Cell-Phone Identification from Recompressed Audio Recordings , 2018, 2018 Twenty Fourth National Conference on Communications (NCC).

[9]  Zheng-Hua Tan,et al.  Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification , 2017, INTERSPEECH.

[10]  Yariv Ephraim,et al.  Statistical-model-based speech enhancement systems , 1992, Proc. IEEE.

[11]  Jiwu Huang,et al.  Band Energy Difference for Source Attribution in Audio Forensics , 2018, IEEE Transactions on Information Forensics and Security.

[12]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Constantine Kotropoulos,et al.  Mobile phone identification using recorded speech signals , 2014, 2014 19th International Conference on Digital Signal Processing.

[15]  D. Thomson,et al.  Spectrum estimation and harmonic analysis , 1982, Proceedings of the IEEE.

[16]  Rachit Aggarwal,et al.  Cellphone identification using noise estimates from recorded audio , 2014, 2014 International Conference on Communication and Signal Processing.

[17]  Antonio Bonafonte,et al.  SEGAN: Speech Enhancement Generative Adversarial Network , 2017, INTERSPEECH.

[18]  Hong Zhao,et al.  Recording environment identification using acoustic reverberation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  George Carayannis,et al.  Speech enhancement from noise: A regenerative approach , 1991, Speech Commun..

[20]  Alan V. Oppenheim,et al.  All-pole modeling of degraded speech , 1978 .

[21]  Cemal Hanilçi,et al.  Recognition of Brand and Models of Cell-Phones From Recorded Speech Signals , 2012, IEEE Transactions on Information Forensics and Security.

[22]  Tomi Kinnunen,et al.  A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Hong Zhao,et al.  Audio Recording Location Identification Using Acoustic Environment Signature , 2013, IEEE Transactions on Information Forensics and Security.