Audiogmenter: a MATLAB Toolbox for Audio Data Augmentation

PurposeCreate and share a MATLAB library that performs data augmentation algorithms for audio data. This study aims to help machine learning researchers to improve their models using the algorithms proposed by the authors.Design/methodology/approachThe authors structured our library into methods to augment raw audio data and spectrograms. In the paper, the authors describe the structure of the library and give a brief explanation of how every function works. The authors then perform experiments to show that the library is effective.FindingsThe authors prove that the library is efficient using a competitive dataset. The authors try multiple data augmentation approaches proposed by them and show that they improve the performance.Originality/valueA MATLAB library specifically designed for data augmentation was not available before. The authors are the first to provide an efficient and parallel implementation of a large number of algorithms.

[1]  R. Marks Introduction to Shannon Sampling and Interpolation Theory , 1990 .

[2]  Meinard Müller,et al.  Improving Time-Scale Modification of Music Signals Using Harmonic-Percussive Separation , 2014, IEEE Signal Processing Letters.

[3]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Luc Van Gool,et al.  AENet: Learning Deep Audio Features for Video Analysis , 2017, IEEE Transactions on Multimedia.

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Navdeep Jaitly,et al.  Vocal Tract Length Perturbation (VTLP) improves speech recognition , 2013 .

[7]  R. Siezen,et al.  others , 1999, Microbial Biotechnology.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Meinard Müller,et al.  TSM Toolbox: MATLAB Implementations of Time-Scale Modification Algorithms , 2014, DAFx.

[10]  Tatsuya Harada,et al.  Learning from Between-class Examples for Deep Sound Recognition , 2017, ICLR.

[11]  Tuomas Oikarinen,et al.  Deep convolutional network for animal sound classification and source attribution using dual audio recordings. , 2019, The Journal of the Acoustical Society of America.

[12]  Karol J. Piczak Environmental sound classification with convolutional neural networks , 2015, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP).

[13]  Sebastian Ewert,et al.  The Audio Degradation Toolbox and Its Application to Robustness Evaluation , 2013, ISMIR.

[14]  Mario Lasseck,et al.  Audio-based Bird Species Identification with Deep Convolutional Neural Networks , 2018, CLEF.

[15]  Zhiyong Xu,et al.  Automated bird acoustic event detection and robust species classification , 2017, Ecol. Informatics.

[16]  P. Alam ‘T’ , 2021, Composites Engineering: An A–Z Guide.

[17]  Fred L. Bookstein,et al.  Principal Warps: Thin-Plate Splines and the Decomposition of Deformations , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Karol J. Piczak ESC: Dataset for Environmental Sound Classification , 2015, ACM Multimedia.

[19]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Colin Raffel,et al.  librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[21]  Nicki Holighaus,et al.  The Large Time-Frequency Analysis Toolbox 2.0 , 2013, CMMR.

[22]  Lars Lundberg,et al.  Classifying environmental sounds using image recognition networks , 2017, KES.

[23]  Juan Pablo Bello,et al.  A Software Framework for Musical Data Augmentation , 2015, ISMIR.

[24]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[25]  Peter L. Tyack,et al.  The Watkins Marine Mammal Sound Database: An online, freely accessible resource , 2016 .

[26]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Joonwhoan Lee,et al.  Domestic Cat Sound Classification Using Transfer Learning , 2018, Int. J. Fuzzy Log. Intell. Syst..

[28]  P. Dhanalakshmi,et al.  SVM and HMM Modeling Techniques for Speech Recognition Using LPCC and MFCC Features , 2014, FICTA.

[29]  Luc Van Gool,et al.  Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Detection , 2016, ArXiv.

[30]  Thomas Hofmann,et al.  Audio Based Bird Species Identification using Deep Learning Techniques , 2016, CLEF.

[31]  Quoc V. Le,et al.  SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[32]  Justin Salamon,et al.  Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.