Recognition of sounds from musical instruments : a critical review and experiments

In this paper we review methods for the recognition and classification of mu sical timbre. Although the human ear, especially that of well-trained listeners and musicians, is highly capable of recognizing the sounds of musical instruments using several cues including attack and timbre -- the problem of building an automatic system for sound recognition appears to be a challenging one. Various results are reported in the literature but they are highly dependent on the conditions of the experiment, such as database length, number of instruments. The timbre attribute of a sound is characterized by the spectral envelope. Most timbre recognition methods are based on the extraction of cepstral coefficients arranged on a perceptual scale, which allows us to separate the contribution of the fine harmonic structure from the spectral envelope. A well -suited perceptual representation is given by the mel scale, on which the Mel Frequency Cepstrum Coefficients (MFCC) method is based. The MFCC of reference sounds in a database are employed to build a sound classifier. This has been experimented in two directions: a non linear classifier, based on a trainable neural network and a statistical pattern classifier based on data mining techniques. The first method uses a neural network based on the Self Organizing Map (after Kohonen), which performs a non linear bidimensional projection. The map is trained on a set of timbres from 30 musical instruments and generates a timbral space identified by the firing patterns of the excited neurons. In the second approach, more suited to a classification task, the dimensionality of the original space of MFCC parameters is reduced by using projections based on principal components analysis (PCA). Good results are obtained in the clustering process by using a drastical reduction to a bidimensional parameter space.