Environment Transfer for Distributed Systems

Collecting sufficient amount of data that can represent various acoustic environmental attributes is a critical problem for distributed acoustic machine learning. Several audio data augmentation techniques have been introduced to address this problem but they tend to remain in simple manipulation of existing data and are insufficient to cover the variability of the environments. We propose a method to extend a technique that has been used for transferring acoustic style textures between audio data. The method transfers audio signatures between environments for distributed acoustic data augmentation. This paper devises metrics to evaluate the generated acoustic data, based on classification accuracy and content preservation. A series of experiments were conducted using UrbanSound8K dataset and the results show that the proposed method generates better audio data with transferred environmental features while preserving content features.

[1]  Regina Barzilay,et al.  Style Transfer from Non-Parallel Text by Cross-Alignment , 2017, NIPS.

[2]  Patrick Nguyen,et al.  Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis , 2018, NeurIPS.

[3]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[4]  Aren Jansen,et al.  Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Justin Salamon,et al.  A Dataset and Taxonomy for Urban Sound Research , 2014, ACM Multimedia.

[6]  Leon A. Gatys,et al.  Texture Synthesis Using Convolutional Neural Networks , 2015, NIPS.

[7]  Yoshua Bengio,et al.  SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.

[8]  Lorien Y. Pratt,et al.  Discriminability-Based Transfer between Neural Networks , 1992, NIPS.

[9]  Justin Salamon,et al.  Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.

[10]  Shiqiang Wang,et al.  Acoustic anomaly detection system: demo abstract , 2019, SenSys.

[11]  Yohei Kawaguchi,et al.  Description and Discussion on DCASE2020 Challenge Task2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring , 2020, ArXiv.

[12]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[13]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Dhruv Ramani,et al.  Autoencoder Based Architecture For Fast & Real Time Audio Style Transfer , 2018, ArXiv.

[15]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[16]  Harsh Jhamtani,et al.  Shakespearizing Modern Language Using Copy-Enriched Sequence to Sequence Models , 2017, Proceedings of the Workshop on Stylistic Variation.

[17]  Daniel Cremers,et al.  Regularization for Deep Learning: A Taxonomy , 2017, ArXiv.

[18]  Quoc V. Le,et al.  SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[19]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[20]  Tommi S. Jaakkola,et al.  Sequence to Better Sequence: Continuous Revision of Combinatorial Structures , 2017, ICML.

[21]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  John A. Hildebrand,et al.  Biogeographic characterisation of blue whale song worldwide: using song to identify populations , 2023, J. Cetacean Res. Manage..

[23]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[24]  Thomas Grill,et al.  Exploring Data Augmentation for Improved Singing Voice Detection with Neural Networks , 2015, ISMIR.

[25]  Norbert Dillier,et al.  A fast and accurate “shoebox” room acoustics simulator , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Dongyan Zhao,et al.  Style Transfer in Text: Exploration and Evaluation , 2017, AAAI.

[27]  Heikki Huttunen,et al.  Recurrent neural networks for polyphonic sound event detection in real life recordings , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Antonio Torralba,et al.  SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.

[29]  Percy Liang,et al.  Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer , 2018, NAACL.

[30]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[31]  Juan Pablo Bello,et al.  A Software Framework for Musical Data Augmentation , 2015, ISMIR.

[32]  Patrick Pérez,et al.  Audio Style Transfer , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Leon A. Gatys,et al.  Texture Synthesis Using Shallow Convolutional Networks with Random Filters , 2016, ArXiv.

[34]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[35]  Zeeshan Syed,et al.  A Framework for the Analysis of Acoustical Cardiac Signals , 2007, IEEE Transactions on Biomedical Engineering.