Sinusoidal wave generating network based on adversarial learning and its application: synthesizing frog sounds for data augmentation

Simulators that generate observations based on theoretical models can be important tools for development, prediction, and assessment of signal processing algorithms. In order to design these simulators, painstaking effort is required to construct mathematical models according to their application. Complex models are sometimes necessary to represent a variety of real phenomena. In contrast, obtaining synthetic observations from generative models developed from real observations often require much less effort. This paper proposes a generative model based on adversarial learning. Given that observations are typically signals composed of a linear combination of sinusoidal waves and random noises, sinusoidal wave generating networks are first designed based on an adversarial network. Audio waveform generation can then be performed using the proposed network. Several approaches to designing the objective function of the proposed network using adversarial learning are investigated experimentally. In addition, amphibian sound classification is performed using a convolutional neural network trained with real and synthetic sounds. Both qualitative and quantitative results show that the proposed generative model makes realistic signals and is very helpful for data augmentation and data analysis.

[1]  Sergios Theodoridis,et al.  Chapter 2 – Classifiers Based on Bayes Decision Theory , 2006 .

[2]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[3]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[4]  Laurent Girin,et al.  Multiple-Speaker Localization Based on Direct-Path Features and Likelihood Maximization With Spatial Sparsity Regularization , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  Seongkyu Mun,et al.  GENERATIVE ADVERSARIAL NETWORK BASED ACOUSTIC SCENE TRAINING SET AUGMENTATION AND SELECTION USING SVM HYPERPLANE , 2017 .

[6]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[7]  David Berthelot,et al.  BEGAN: Boundary Equilibrium Generative Adversarial Networks , 2017, ArXiv.

[8]  Radu Horaud,et al.  Blind MultiChannel Identification and Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function , 2017, ArXiv.

[9]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[10]  Jundai Sun,et al.  Single source bins detection-based localisation scheme for multiple speech sources , 2017 .

[11]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[12]  Zengchang Qin,et al.  Generative Cooperative Net for Image Generation and Data Augmentation , 2019, IUKM.

[13]  Frank Nielsen,et al.  DeepBach: a Steerable Model for Bach Chorales Generation , 2016, ICML.

[14]  Hanseok Ko,et al.  Subspace projection cepstral coefficients for noise robust acoustic event recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Mark Bush,et al.  Anuran call classification with deep learning , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Yi-Hsuan Yang,et al.  MidiNet: A Convolutional Generative Adversarial Network for Symbolic-Domain Music Generation , 2017, ISMIR.

[17]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[18]  Radu Horaud,et al.  An em algorithm for audio source separation based on the convolutive transfer function , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[19]  Denis Lukovnikov,et al.  On the regularization of Wasserstein GANs , 2017, ICLR.

[20]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[21]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[22]  Gerard V. Trunk,et al.  A Problem of Dimensionality: A Simple Example , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Radu Horaud,et al.  Joint Alignment of Multiple Point Sets with Batch and Incremental Expectation-Maximization , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[26]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[27]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[28]  Olof Mogren,et al.  C-RNN-GAN: Continuous recurrent neural networks with adversarial training , 2016, ArXiv.

[29]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[30]  Chris Donahue,et al.  Synthesizing Audio with Generative Adversarial Networks , 2018, ArXiv.

[31]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Hanseok Ko,et al.  Acoustic event filterbank for enabling robust event recognition by cleaning robot , 2015, IEEE Transactions on Consumer Electronics.

[33]  Yoshua Bengio,et al.  Mode Regularized Generative Adversarial Networks , 2016, ICLR.

[34]  Sanja Fidler,et al.  Song From PI: A Musically Plausible Network for Pop Music Generation , 2016, ICLR.

[35]  Chris Donahue,et al.  Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  D. R. Campbell,et al.  A MATLAB Simulation of “ Shoebox ” Room Acoustics for use in Research and Teaching , 2022 .

[37]  Hanseok Ko,et al.  Compact HF Surface Wave Radar Data Generating Simulator for Ship Detection and Tracking , 2017, IEEE Geoscience and Remote Sensing Letters.

[38]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[39]  Tara N. Sainath,et al.  Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home , 2017, INTERSPEECH.

[40]  Yoshua Bengio,et al.  Generative Adversarial Networks , 2014, ArXiv.

[41]  Hanseok Ko,et al.  Voice Activity Detection in Noisy Environments Based on Double-Combined Fourier Transform and Line Fitting , 2014, TheScientificWorldJournal.

[42]  Rob A. Rutenbar,et al.  A case study of machine learning hardware: Real-time source separation using Markov Random Fields via sampling-based inference , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).