Gabor-Based Nonuniform Scale-Frequency Map for Environmental Sound Classification in Home Automation

This work presents a novel feature extraction approach called nonuniform scale-frequency map for environmental sound classification in home automation. For each audio frame, important atoms from the Gabor dictionary are selected by using the Matching Pursuit algorithm. After the system disregards phase and position information, the scale and frequency of the atoms are extracted to construct a scale-frequency map. Principal Component Analysis (PCA) and Linear Discriminate Analysis (LDA) are then applied to the scale-frequency map, subsequently generating the proposed feature. During the classification phase, a segment-level multiclass Support Vector Machine (SVM) is operated. Experiments on a 17-class sound database indicate that the proposed approach can achieve an accuracy rate of 86.21%. Furthermore, a comparison reveals that the proposed approach is superior to the other time-frequency methods.

[1]  Antonia Papandreou-Suppappola,et al.  Classification of Acoustic Emissions Using Modified Matching Pursuit , 2004, EURASIP J. Adv. Signal Process..

[2]  B. Moore An introduction to the psychology of hearing, 3rd ed. , 1989 .

[3]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Karthikeyan Umapathy,et al.  Time-Width Versus Frequency Band Mapping of Energy Distributions , 2007, IEEE Transactions on Signal Processing.

[5]  William Brent Perceptually Based Pitch Scales in Cepstral Techniques for Percussive Timbre Identification , 2009, ICMC.

[6]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[7]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[8]  Kuldip K. Paliwal,et al.  Approximate LDA Technique for Dimensionality Reduction in the Small Sample Size Case , 2011 .

[9]  Andrew Sekey,et al.  An Objective Measure for Predicting Subjective Quality of Speech Coders , 1992, IEEE J. Sel. Areas Commun..

[10]  Karthikeyan Umapathy,et al.  Multigroup classification of audio signals using time-frequency parameters , 2005, IEEE Transactions on Multimedia.

[11]  Keikichi Hirose,et al.  An automatic approach to virtual living based on environmental sound cues , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[12]  Wolfgang Effelsberg,et al.  Automatic audio content analysis , 1997, MULTIMEDIA '96.

[13]  A. Fleury,et al.  Sound and speech detection and classification in a Health Smart Home , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[14]  Stan Z. Li,et al.  Content-based audio classification and retrieval using the nearest feature line method , 2000, IEEE Trans. Speech Audio Process..

[15]  John D. Durrant,et al.  Bases of Hearing Science , 1984 .

[16]  Andrey Temko,et al.  CLEAR Evaluation of Acoustic Event Detection and Classification Systems , 2006, CLEAR.

[17]  Man-Wai Mak,et al.  Alleviating the small sample-size problem in i-vector based speaker verification , 2012, 2012 8th International Symposium on Chinese Spoken Language Processing.

[18]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[19]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[20]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[21]  Jhing-Fa Wang,et al.  Robust Environmental Sound Recognition for Home Automation , 2008, IEEE Transactions on Automation Science and Engineering.

[22]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[23]  Daoqiang Zhang,et al.  Efficient and robust feature extraction by maximum margin criterion , 2003, IEEE Transactions on Neural Networks.

[24]  Pierre Vandergheynst,et al.  A low complexity Orthogonal Matching Pursuit for sparse signal approximation with shift-invariant dictionaries , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Jhing-Fa Wang,et al.  Environmental Sound Classification using Hybrid SVM/KNN Classifier and MPEG-7 Audio Low-Level Descriptor , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[26]  Jonathan Foote,et al.  Content-based retrieval of music and audio , 1997, Other Conferences.

[27]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Zheng Jiming,et al.  Modified Local Discriminant Bases and Its Application in Audio Feature Extraction , 2009, 2009 International Forum on Information Technology and Applications.

[29]  E. Zwicker,et al.  Subdivision of the audible frequency range into critical bands , 1961 .

[30]  Juyang Weng,et al.  Using Discriminant Eigenfeatures for Image Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Michel Vacher,et al.  Development of Audio Sensing Technology for Ambient Assisted Living: Applications and Challenges , 2011, Int. J. E Health Medical Commun..

[32]  Brigitte Meillon,et al.  The sweet-home project: Audio technology in smart homes to improve well-being and reliance , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[33]  R. Fay,et al.  Auditory perception of sound sources , 2007 .

[34]  Trieu-Kien Truong,et al.  Audio classification and categorization based on wavelets and support vector Machine , 2005, IEEE Transactions on Speech and Audio Processing.

[35]  Jhing-Fa Wang,et al.  Content-Based Audio Classification Using Support Vector Machines and Independent Component Analysis , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[36]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[37]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[38]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[39]  Ning Liu,et al.  Bathroom Activity Monitoring Based on Sound , 2005, Pervasive.

[40]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[41]  J. W. Gordon,et al.  Perceptual effects of spectral modifications on musical timbres , 1978 .

[42]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..