Environmental Sound Recognition With Time–Frequency Audio Features

The paper considers the task of recognizing environmental sounds for the understanding of a scene or context surrounding an audio sensor. A variety of features have been proposed for audio recognition, including the popular Mel-frequency cepstral coefficients (MFCCs) which describe the audio spectral shape. Environmental sounds, such as chirpings of insects and sounds of rain which are typically noise-like with a broad flat spectrum, may include strong temporal domain signatures. However, only few temporal-domain features have been developed to characterize such diverse audio signals previously. Here, we perform an empirical feature analysis for audio environment characterization and propose to use the matching pursuit (MP) algorithm to obtain effective time-frequency features. The MP-based method utilizes a dictionary of atoms for feature selection, resulting in a flexible, intuitive and physically interpretable set of features. The MP-based feature is adopted to supplement the MFCC features to yield higher recognition accuracy for environmental sounds. Extensive experiments are conducted to demonstrate the effectiveness of these joint features for unstructured environmental sound classification, including listening tests to study human recognition capabilities. Our recognition system has shown to produce comparable performance as human listeners.

[1]  Vesa T. Peltonen,et al.  Computational auditory scene recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[3]  Johan Himberg,et al.  Collaborative context determination to support mobile terminal applications , 2002, IEEE Wirel. Commun..

[4]  Daniel Patrick Whittlesey Ellis,et al.  Prediction-driven computational auditory scene analysis , 1996 .

[5]  R. Radhakrishnan,et al.  Audio analysis for surveillance applications , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[6]  Wolfram Burgard,et al.  MINERVA: a second-generation museum tour-guide robot , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[7]  Peter Kabal,et al.  Speech/music discrimination for multimedia applications , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  Rémi Gribonval,et al.  Harmonic decomposition of audio signals with matching pursuit , 2003, IEEE Trans. Signal Process..

[9]  Lie Lu,et al.  A flexible framework for key audio effects detection and auditory context inference , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Rémi Gribonval,et al.  Fast matching pursuit with a multiscale dictionary of Gaussian chirps , 2001, IEEE Trans. Signal Process..

[11]  Alexander H. Waibel,et al.  Classifying user environment for mobile applications using linear autoencoding of ambient audio , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[12]  Manuel Rosa-Zurera,et al.  Transient modeling by matching pursuits with a wavelet dictionary for parametric audio coding , 2004, IEEE Signal Processing Letters.

[13]  A. Ayatollahi,et al.  Comparing Gaussian and chirplet dictionaries for time-frequency analysis using matching pursuit decomposition , 2003, Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No.03EX795).

[14]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[15]  Jelena Kovacevic,et al.  Wavelets and Subband Coding , 2013, Prentice Hall Signal Processing Series.

[16]  Guang Yang,et al.  Matching-pursuit-based adaptive wavelet-packet atomic decomposition applied in ultrasonic inspection , 2007 .

[17]  Michael J. Carey,et al.  A comparison of features for speech, music discrimination , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[18]  Karthikeyan Umapathy,et al.  Multigroup classification of audio signals using time-frequency parameters , 2005, IEEE Transactions on Multimedia.

[19]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[20]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Markus Koppenberger,et al.  Nearest-neighbor Generic Sound Classification with a WordNet-based Taxonomy , 2004 .

[22]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[23]  Emanuele Pollastri,et al.  Musical Instrument Timbres Classification with Spectral Features , 2001, 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No.01TH8564).

[24]  Universityof SouthernCalifornia LosAngeles Laser-based People Tracking , 2002 .

[25]  Keansub Lee,et al.  Minimal-impact audio-based personal archives , 2004, CARPE'04.

[26]  Alexander H. Waibel CHIL - Computers in the Human Interaction Loop , 2005, MVA.

[27]  C.-C. Jay Kuo,et al.  Environmental sound recognition using MP-based features , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Avideh Zakhor,et al.  Very low bit-rate video coding based on matching pursuits , 1997, IEEE Trans. Circuits Syst. Video Technol..

[29]  C.-C. Jay Kuo,et al.  Where am I? Scene Recognition for Mobile Robots using Audio Features , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[30]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[31]  Antonia Papandreou-Suppappola,et al.  Classification of Acoustic Emissions Using Modified Matching Pursuit , 2004, EURASIP J. Adv. Signal Process..

[32]  Joelle Pineau,et al.  Towards robotic assistants in nursing homes: Challenges and results , 2003, Robotics Auton. Syst..

[33]  Alex Pentland,et al.  Auditory Context Awareness via Wearable Computing , 1998 .

[34]  Jie Huang Spatial auditory processing for a hearing robot , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[35]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[36]  T.H. Crystal,et al.  Linear prediction of speech , 1977, Proceedings of the IEEE.

[37]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[38]  Fabien Gouyon,et al.  Automatic Classification of Drum Sounds: A Comparison of Feature Selection Methods and Classification Techniques , 2002, ICMAI.

[39]  François Pachet,et al.  The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. , 2007, The Journal of the Acoustical Society of America.

[40]  Cedric Nishan Canagarajah,et al.  Underdetermined noisy blind separation using dual matching pursuits , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41]  Holly A. Yanco,et al.  Wheelesley: A Robotic Wheelchair System: Indoor Navigation and User Interface , 1998, Assistive Technology and Artificial Intelligence.