Environmental sound recognition using time-frequency intersection patterns

Environmental sound recognition is an important function of robots and intelligent computer systems. In this research, we tried to use a multi-stage perceptron type neural network system for environmental sound recognition. The input data is the one-dimensional combination of instantaneous spectrum at power peak and the power pattern in time domain. Since for almost environmental sounds, their spectrum changes are not remarkable compared with speech or voice, the combination of power and frequency pattern will preserve the major features of environmental sounds but with drastically reduced data. Two experiments were conducted using an original database and a database created by the RWCP. The recognition rate for about 45 data kinds of environmental sound was about 92%. The merit of this method is the use of a one-dimensional input which combines the power pattern and the instantaneous spectrum of sound data. Comparing with the method using only instantaneous spectrum, the new method are sufficient for larger sound database and the recognition rate was increased about 12%. The results are also comparable with the methods of HMM, while those methods require 2-dimensional spectrum time series data and more complicated computation.