论文信息 - Audiogmenter: a MATLAB Toolbox for Audio Data Augmentation

Audiogmenter: a MATLAB Toolbox for Audio Data Augmentation

PurposeCreate and share a MATLAB library that performs data augmentation algorithms for audio data. This study aims to help machine learning researchers to improve their models using the algorithms proposed by the authors.Design/methodology/approachThe authors structured our library into methods to augment raw audio data and spectrograms. In the paper, the authors describe the structure of the library and give a brief explanation of how every function works. The authors then perform experiments to show that the library is effective.FindingsThe authors prove that the library is efficient using a competitive dataset. The authors try multiple data augmentation approaches proposed by them and show that they improve the performance.Originality/valueA MATLAB library specifically designed for data augmentation was not available before. The authors are the first to provide an efficient and parallel implementation of a large number of algorithms.

Loris Nanni | Gianluca Maguolo | Michelangelo Paci | Ludovico Bonan

[1] R. Marks. Introduction to Shannon Sampling and Interpolation Theory , 1990 .

[2] Meinard Müller,et al. Improving Time-Scale Modification of Music Signals Using Harmonic-Percussive Separation , 2014, IEEE Signal Processing Letters.

[3] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Luc Van Gool,et al. AENet: Learning Deep Audio Features for Video Analysis , 2017, IEEE Transactions on Multimedia.

[5] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6] Navdeep Jaitly,et al. Vocal Tract Length Perturbation (VTLP) improves speech recognition , 2013 .

[7] R. Siezen,et al. others , 1999, Microbial Biotechnology.

[8] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9] Meinard Müller,et al. TSM Toolbox: MATLAB Implementations of Time-Scale Modification Algorithms , 2014, DAFx.

[10] Tatsuya Harada,et al. Learning from Between-class Examples for Deep Sound Recognition , 2017, ICLR.

[11] Tuomas Oikarinen,et al. Deep convolutional network for animal sound classification and source attribution using dual audio recordings. , 2019, The Journal of the Acoustical Society of America.

[12] Karol J. Piczak. Environmental sound classification with convolutional neural networks , 2015, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP).

[13] Sebastian Ewert,et al. The Audio Degradation Toolbox and Its Application to Robustness Evaluation , 2013, ISMIR.

[14] Mario Lasseck,et al. Audio-based Bird Species Identification with Deep Convolutional Neural Networks , 2018, CLEF.

[15] Zhiyong Xu,et al. Automated bird acoustic event detection and robust species classification , 2017, Ecol. Informatics.

[16] P. Alam. ‘T’ , 2021, Composites Engineering: An A–Z Guide.

[17] Fred L. Bookstein,et al. Principal Warps: Thin-Plate Splines and the Decomposition of Deformations , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[18] Karol J. Piczak. ESC: Dataset for Environmental Sound Classification , 2015, ACM Multimedia.

[19] Quoc V. Le,et al. AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Colin Raffel,et al. librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[21] Nicki Holighaus,et al. The Large Time-Frequency Analysis Toolbox 2.0 , 2013, CMMR.

[22] Lars Lundberg,et al. Classifying environmental sounds using image recognition networks , 2017, KES.

[23] Juan Pablo Bello,et al. A Software Framework for Musical Data Augmentation , 2015, ISMIR.

[24] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[25] Peter L. Tyack,et al. The Watkins Marine Mammal Sound Database: An online, freely accessible resource , 2016 .

[26] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Joonwhoan Lee,et al. Domestic Cat Sound Classification Using Transfer Learning , 2018, Int. J. Fuzzy Log. Intell. Syst..

[28] P. Dhanalakshmi,et al. SVM and HMM Modeling Techniques for Speech Recognition Using LPCC and MFCC Features , 2014, FICTA.

[29] Luc Van Gool,et al. Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Detection , 2016, ArXiv.

[30] Thomas Hofmann,et al. Audio Based Bird Species Identification using Deep Learning Techniques , 2016, CLEF.

[31] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[32] Justin Salamon,et al. Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.