论文信息 - Cross-Representation Transferability of Adversarial Attacks: From Spectrograms to Audio Waveforms

Cross-Representation Transferability of Adversarial Attacks: From Spectrograms to Audio Waveforms

This paper shows the susceptibility of spectrogram-based audio classifiers to adversarial attacks and the transferability of such attacks to audio waveforms. Some commonly used adversarial attacks to images have been applied to Mel-frequency and short-time Fourier transform spectrograms, and such perturbed spectrograms are able to fool a 2D convolutional neural network (CNN). Such attacks produce perturbed spectrograms that are visually imperceptible by humans. Furthermore, the audio waveforms reconstructed from the perturbed spectrograms are also able to fool a 1D CNN trained on the original audio. Experimental results on a dataset of western music have shown that the 2D CNN achieves up to 81.87% of mean accuracy on legitimate examples and such performance drops to 12.09% on adversarial examples. Likewise, the 1D CNN achieves up to 78.29% of mean accuracy on original audio samples and such performance drops to 27.91% on adversarial audio waveforms reconstructed from the perturbed spectrograms.

Alceu de Souza Britto | Alessandro Lameiras Koerich | Mohammad Esmaeilpour | Sajjad Abdoli | Karl Michel Koerich

[1] Mark Sandler,et al. Convolutional recurrent neural networks for music classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2] Constantine Kotropoulos,et al. Music Genre Classification Using Locality Preserving Non-Negative Tensor Factorization and Sparse Representations , 2009, ISMIR.

[3] Alessandro Lameiras Koerich,et al. A Machine Learning Approach to Automatic Music Genre Classification , 2008, Journal of the Brazilian Computer Society.

[4] Andreas Rauber,et al. On the suitability of state-of-the-art music information retrieval methods for analyzing, categorizing and accessing non-Western and ethnic music collections , 2010, Signal Process..

[5] Patrick Cardinal,et al. Universal Adversarial Audio Perturbations , 2019, ArXiv.

[6] Ting Wang,et al. SirenAttack: Generating Adversarial Audio for End-to-End Acoustic Systems , 2019, AsiaCCS.

[7] Yoshua Bengio,et al. Speaker Recognition from Raw Waveform with SincNet , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[8] Luiz Eduardo Soares de Oliveira,et al. Music genre classification using LBP textural features , 2012, Signal Process..

[9] Patrick Cardinal,et al. A Robust Approach for Securing Audio Classification Against Adversarial Attacks , 2019, IEEE Transactions on Information Forensics and Security.

[10] Patrick Cardinal,et al. End-to-End Environmental Sound Classification using a 1D Convolutional Neural Network , 2019, Expert Syst. Appl..

[11] Lars Lundberg,et al. Classifying environmental sounds using image recognition networks , 2017, KES.

[12] Patrick Cardinal,et al. Detection of Adversarial Attacks and Characterization of Adversarial Subspace , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.

[14] Bob L. Sturm. An analysis of the GTZAN music genre dataset , 2012, MIRUM '12.

[15] Xavier Serra,et al. Multi-Label Music Genre Classification from Audio, Text and Images Using Deep Features , 2017, ISMIR.

[16] Xavier Serra,et al. Randomly Weighted CNNs for (Music) Audio Classification , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17] Alessandro L. Koerich. Improving the Reliability of Music Genre Classification using Rejection and Verification , 2013, ISMIR.

[18] Benjamin Schrauwen,et al. End-to-end learning for music audio , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19] Xavier Serra,et al. Experimenting with musically motivated convolutional neural networks , 2016, 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI).

[20] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[21] Luiz S. Oliveira,et al. Music genre recognition using spectrograms , 2011, 2011 18th International Conference on Systems, Signals and Image Processing.

[22] Alessandro Lameiras Koerich,et al. Automatic classification of audio data , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[23] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[24] George Tzanetakis,et al. Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[25] Bob L. Sturm,et al. Deep Learning and Music Adversaries , 2015, IEEE Transactions on Multimedia.

[26] Dan Iter,et al. Generating Adversarial Examples for Speech Recognition , 2017 .

[27] Paul Rad,et al. A deep learning approach for mapping music genres , 2017, 2017 12th System of Systems Engineering Conference (SoSE).

[28] Patrick Cardinal,et al. Unsupervised feature learning for environmental sound classification using Weighted Cycle-Consistent Generative Adversarial Network , 2019, Appl. Soft Comput..

[29] Samy Bengio,et al. Adversarial examples in the physical world , 2016, ICLR.

[30] V. Tiwari. MFCC and its applications in speaker recognition , 2010 .

[31] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32] Juhan Nam,et al. Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging , 2017, IEEE Signal Processing Letters.

[33] Dan Stowell,et al. Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning , 2014, PeerJ.