With the computational power available today, machine learning is becoming a very active field finding its applications in our everyday life. One of its biggest challenge is the classification task involving data representation (the preprocessing part in a machine learning algorithm). In fact, classification of linearly separable data can be easily done. The aim of the preprocessing part is to obtain well represented data by mapping raw data into a "feature space" where simple classifiers can be used efficiently. For example, almost everything around audio/bioacoustic uses MFCC features until now. We present here a toolbox giving the basic tools for audio representation using the C++ programming language by providing an implementation of the Scattering Network which brings a new and powerful solution for these tasks. We focused our implementation to massive dataset and servers applications. The toolkit of reference in scattering analysis is SCATNET from Mallat et al. http://www.di.ens.fr/data/software/scatnet/. This tool is an attempt to have some of the scatnet features moretractable for Big Data challenges. Furthermore, the use of this toolbox is not limited to machine learning preprocessing. It can also be used for more advanced biological analysis such as animal communication behaviours analysis or any biological study related to signal analysis. This implementation gives out of the box executables that can be used by simple commands without a graphical interface and is thus suited for server applications. As we will review in the next part, we will need to perform data manipulation on huge dataset. It becomes important to have fast and efficient implementations in order to deal with this new "Big Data" era.
[1]
Alan R. Jones,et al.
Fast Fourier Transform
,
1970,
SIGP.
[2]
S. Mallat.
A wavelet tour of signal processing
,
1998
.
[3]
Stéphane Mallat,et al.
Group Invariant Scattering
,
2011,
ArXiv.
[4]
Stéphane Mallat,et al.
Rotation, Scaling and Deformation Invariant Scattering for Texture Discrimination
,
2013,
2013 IEEE Conference on Computer Vision and Pattern Recognition.
[5]
Joakim Andén,et al.
Deep Scattering Spectrum
,
2013,
IEEE Transactions on Signal Processing.
[6]
Hervé Glotin,et al.
LifeCLEF 2014: Multimedia Life Species Identification Challenges
,
2014,
CLEF.
[7]
Randall Balestriero,et al.
Heterogeneity of Amazon River dolphin high-frequency clicks: Current Odontoceti bioacoustic terminology in need of standardization
,
2015
.