Generalized Recognition of Sound Events: Approaches and Applications

This chapter surveys the contemporary approaches of automatic sound recognition and discusses the benefits stemming from real-world applications of this technology. We identify the common aspects and subtle differences among these diverse application areas and review state-of-the-art systems. In this context we project that there is much space for knowledge transfer between the different subfields of sound classification, which seem to evolve independently while achieving different states of maturity. Particular emphasis is given to lessons learned from the speech recognition paradigm, which together with speaker recognition were among the first applications of sound classification that reached the status of launching commercial products at a large climax. Special attention is paid to new emerging applications such as environmental monitoring and bioacoustic identification and applications to music which have already started altering our everyday life as we once knew it.

[1]  Yuebin Guo,et al.  Real-time acoustic emission monitoring for surface damage in hard machining , 2005 .

[2]  L. Hansen LARGE SAMPLE PROPERTIES OF GENERALIZED METHOD OF , 1982 .

[3]  Thomas Sikora,et al.  Comparison of MPEG-7 audio spectrum projection features and MFCC applied to speaker recognition, sound classification and audio segmentation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[5]  Yoshua Bengio,et al.  Input-output HMMs for sequence processing , 1996, IEEE Trans. Neural Networks.

[6]  Panagiotis Zervas,et al.  Evaluating Intonational Features for Emotion Recognition from Speech , 2007, Int. J. Artif. Intell. Tools.

[7]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[8]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[9]  Mark A. O'Neill,et al.  Automated identification of live moths (Macrolepidoptera) using digital automated identification System (DAISY) , 2004 .

[10]  David Dornfeld,et al.  Quantitative Relationships for Acoustic Emission from Orthogonal Metal Cutting , 1981 .

[11]  George Tzanetakis,et al.  Audio Analysis using the Discrete Wavelet Transform , 2001 .

[12]  Brian Sallans,et al.  A Hierarchical Community of Experts , 1999, Learning in Graphical Models.

[13]  Richard S. Goldhor,et al.  Recognition of environmental sounds , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Eric Allamanche,et al.  Content-based Identification of Audio Material Using MPEG-7 Low Level Description , 2001, ISMIR.

[15]  Michael I. Jordan Serial Order: A Parallel Distributed Processing Approach , 1997 .

[16]  Siril Yella,et al.  Condition Monitoring Using Pattern Recognition Techniques on Data from Acoustic Emissions , 2006, 2006 5th International Conference on Machine Learning and Applications (ICMLA'06).

[17]  David Dornfeld,et al.  Acoustic Emission Sensing of Tool Wear in Face Milling , 1987 .

[18]  Claus Hetzer,et al.  Observations of surf infrasound in Hawai‘i , 2003 .

[19]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[20]  Daniel P. W. Ellis,et al.  Anchor space for classification and similarity measurement of music , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[21]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[22]  Fausto Pellandini,et al.  Automatic sound detection and recognition for noisy environment , 2000, 2000 10th European Signal Processing Conference.

[23]  Lei Chen,et al.  Mixed Type Audio Classification with Support Vector Machine , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[24]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[25]  S. Godsill,et al.  Bayesian variable selection and regularization for time–frequency surface estimation , 2004 .

[26]  L. Gavidia-Ceballos,et al.  Direct speech feature estimation using an iterative EM algorithm for vocal fold pathology detection , 1996, IEEE Transactions on Biomedical Engineering.

[27]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[28]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[29]  Lawrence K. Saul,et al.  Maximum likelihood and minimum classification error factor analysis for automatic speech recognition , 2000, IEEE Trans. Speech Audio Process..

[30]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[31]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Adam Lindsay,et al.  Overview of MPEG-7 audio , 2001, IEEE Trans. Circuits Syst. Video Technol..

[33]  Andrey Temko,et al.  Classification of acoustic events using SVM-based clustering schemes , 2006, Pattern Recognit..

[34]  Nikos Fakotakis,et al.  Probabilistic neural networks combined with GMMs for speaker recognition over telephone channels , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[35]  David V. Anderson,et al.  Audio classification and scene recognition and for hearing aids , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[36]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[37]  Sakis Drosopoulos,et al.  Insect Sounds and Communication: Physiology, Behaviour, Ecology and Evolution , 2005 .

[38]  Antti J. Eronen,et al.  Musical instrument recognition using ICA-based transform of features and discriminatively trained HMMs , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[39]  Guy J. Brown,et al.  A missing feature approach to instrument identification in polyphonic music , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[40]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41]  Gerhard Widmer Guest editorial: Machine learning in and for music , 2006, Machine Learning.

[42]  Sue L. Denham,et al.  Robust sound classification through the representation of similarity using response fields derived from stimuli during early experience , 2005, Biological Cybernetics.

[43]  Bülent Bolat,et al.  Musical Sound Recognition by Active Learning PNN , 2006, MRCS.

[44]  M. O'Neill,et al.  Automated species identification: why not? , 2004, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[45]  Keith Dana Martin,et al.  Sound-source recognition: a theory and computational model , 1999 .

[46]  Ichiro Inasaki,et al.  Application of acoustic emission sensor for monitoring machining processes , 1998 .

[47]  S. Vajpayee,et al.  Tool health monitoring using acoustic emission , 1987 .

[48]  Siril Yella,et al.  Pattern recognition approach for the automatic classification of data from impact acoustics , 2006, Artificial Intelligence and Soft Computing.

[49]  Alexander H. Waibel,et al.  Temporal ICA for classification of acoustic events i a kitchen environment , 2005, INTERSPEECH.

[50]  Stephen McAdams,et al.  Instrument Sound Description in the Context of MPEG-7 , 2000, ICMC.

[51]  A. Kondoz,et al.  Analysis and improvement of a statistical model-based voice activity detector , 2001, IEEE Signal Processing Letters.

[52]  Masao Nakagawa,et al.  IOSES: An Indoor Observation System Based on Environmental Sounds Recognition Using a Neural Network , 1996 .

[53]  Justinian P. Rosca,et al.  Bayesian single channel speech enhancement exploiting sparseness in the ICA domain , 2004, 2004 12th European Signal Processing Conference.

[54]  Y.K. Muthusamy,et al.  Reviewing automatic language identification , 1994, IEEE Signal Processing Magazine.

[55]  C.-C. Jay Kuo,et al.  Where am I? Scene Recognition for Mobile Robots using Audio Features , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[56]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[57]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[58]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[59]  Ciro Martins,et al.  Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system , 1995, EUROSPEECH.

[60]  Mark D Skowronski,et al.  Acoustic detection and classification of Microchiroptera using machine learning: lessons learned from automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[61]  Anssi Klapuri,et al.  Signal Processing Methods for Music Transcription , 2006 .

[62]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[63]  Mingchun Liu,et al.  A study on content-based classification and retrieval of audio database , 2001, Proceedings 2001 International Database Engineering and Applications Symposium.

[64]  Marcus Purat,et al.  Audio coding with a dynamic wavelet packet decomposition based on frequency-varying modulated lapped transforms , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[65]  Josef Kittler,et al.  Experimental evaluation of expert fusion strategies , 1999, Pattern Recognit. Lett..

[66]  Eric Sanders,et al.  Speechdat multilingual speech databases for teleservices: across the finish line , 1999, EUROSPEECH.

[67]  Perfecto Herrera-Boyer,et al.  Automatic Classification of Musical Instrument Sounds , 2003 .

[68]  Roy D. Patterson,et al.  Stabilised wavelet mellin transform: an auditory strategy for normalising sound-source size , 1999, EUROSPEECH.

[69]  Christian Breiteneder,et al.  Discrimination and retrieval of animal sounds , 2006, 2006 12th International Multi-Media Modelling Conference.

[70]  Derry Fitzgerald,et al.  SUB-BAND INDEPENDENT SUBSPACE ANALYSIS FOR DRUM TRANSCRIPTION , 2002 .

[71]  B. David,et al.  Efficient Musical Instrument Recognition on Solo Performance Music using Basic Features , 2004 .

[72]  Michael A. Casey Reduced-Rank Spectra and Minimum Entropy Priors for Generalized Sound Recognition , 2001 .

[73]  Yan Yan,et al.  [Heart sound recognition algorithm based on PNN for evaluating cardiac contractility change trend]. , 2006, Sheng wu yi xue gong cheng xue za zhi = Journal of biomedical engineering = Shengwu yixue gongchengxue zazhi.

[74]  R. M. Hennig Acoustic feature extraction by cross-correlation in crickets? , 2003, Journal of Comparative Physiology A.

[75]  Thomas Sikora,et al.  Audio classification based on MPEG-7 spectral basis representations , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[76]  Anselmo Eduardo Diniz,et al.  Correlating tool life, tool wear and surface roughness by monitoring acoustic emission in finish turning , 1992 .

[77]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[78]  J. Mason,et al.  Algorithms for approximation , 1987 .

[79]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[80]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[81]  Alex Pentland,et al.  Auditory Context Awareness via Wearable Computing , 1998 .

[82]  Yasuo Ariki,et al.  Effectiveness of KL-transformation in spectral delta expansion , 1999, EUROSPEECH.

[83]  Detlev Langmann,et al.  A comparative study of linear feature transformation techniques for automatic speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[84]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[85]  Luis Weruaga,et al.  Adaptive chirp-based time-frequency analysis of speech signals , 2006, Speech Commun..

[86]  Dimitris K. Tasoulis,et al.  Generalized locally recurrent probabilistic neural networks with application to text-independent speaker verification , 2007, Neurocomputing.

[87]  Teruyo Oba Application of automated bioacoustic identification in environmental education and assessment. , 2004, Anais da Academia Brasileira de Ciencias.

[88]  Guy J. Brown,et al.  Instrument recognition in accompanied sonatas and concertos , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[89]  Samantha J Barry,et al.  The automatic recognition and counting of cough , 2006, Cough.

[90]  Richard D. Alexander,et al.  Sound Production and Associated Behavior in Insects , 1957 .

[91]  M. Casey,et al.  MPEG-7 sound-recognition tools , 2001, IEEE Trans. Circuits Syst. Video Technol..

[92]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[93]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[94]  Kevin J. Lang A time delay neural network architecture for speech recognition , 1989 .

[95]  Renate Sitte,et al.  Non-speech environmental sound identification for surveillance using self-organizing-maps , 2007 .

[96]  Jie Huang,et al.  Environmental sound recognition by the instantaneous spectrum combined with the time pattern of power , 2004, Neural Networks and Computational Intelligence.

[97]  Stefanie Tellex,et al.  An Audio-Based Personal Memory Aid , 2004, UbiComp.

[98]  P. M. Lister,et al.  Tool Condition Monitoring Systems , 1986 .

[99]  Donald F. Specht,et al.  Generation of Polynomial Discriminant Functions for Pattern Recognition , 1967, IEEE Trans. Electron. Comput..

[100]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[101]  Gérard Chollet Automatic speech and speaker recognition: overview, current issues and perspectives , 1995 .

[102]  José A. R. Fonollosa,et al.  Feature decorrelation methods in speech recognition. a comparative study , 1998, ICSLP.

[103]  Chunru Wan,et al.  Feature selection for automatic classification of musical instrument sounds , 2001, JCDL '01.

[104]  Shigeo Abe Support Vector Machines for Pattern Classification , 2010, Advances in Pattern Recognition.

[105]  Albrecht Schmidt,et al.  Adding some smartness to devices and everyday things , 2000, Proceedings Third IEEE Workshop on Mobile Computing Systems and Applications.

[106]  Xavier Rodet,et al.  Automatically selecting signal descriptors for SoundClassification , 2002, ICMC.

[107]  E. D. Chesmore,et al.  Application of time domain signal coding and artificial neural networks to passive acoustical identification of animals , 2001 .

[108]  Yu Wang,et al.  Acoustic Accident Detection System , 2002, J. Intell. Transp. Syst..

[109]  Michael A. Casey,et al.  General sound classification and similarity in MPEG-7 , 2001, Organised Sound.

[110]  Malcolm Slaney,et al.  Mixtures of probability experts for audio retrieval and indexing , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[111]  L. Gavidia-Ceballos,et al.  A nonlinear operator-based speech feature analysis method with application to vocal fold pathology assessment , 1998, IEEE Transactions on Biomedical Engineering.

[112]  Regunathan Radhakrishnan,et al.  Comparing MFCC and MPEG-7 audio features for feature extraction, maximum likelihood HMM and entropic prior HMM for sports audio classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[113]  G. Peeters Automatic Classification of Large Musical Instrument Databases Using Hierarchical Classifiers with Inertia Ratio Maximization , 2003 .

[114]  M. J. D. Powell,et al.  Radial basis functions for multivariable interpolation: a review , 1987 .

[115]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[116]  C. D. Smith,et al.  The application of ultrasound to the inspection of insulation , 1995, Proceedings of 1995 IEEE 5th International Conference on Conduction and Breakdown in Solid Dielectrics.

[117]  K. Gopalan Speech modification by selective Fourier-Bessel series expansion of speech signals , 1999, 1999 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM 1999). Conference Proceedings (Cat. No.99CH36368).

[118]  Trieu-Kien Truong,et al.  Audio classification and categorization based on wavelets and support vector Machine , 2005, IEEE Transactions on Speech and Audio Processing.

[119]  John H. L. Hansen,et al.  Foreign accent classification using source generator based prosodic features , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[120]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[121]  J C Brown,et al.  Feature dependence in the automatic identification of musical woodwind instruments. , 2001, The Journal of the Acoustical Society of America.

[122]  Renate Sitte,et al.  Comparison of techniques for environmental sound recognition , 2003, Pattern Recognit. Lett..

[123]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[124]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[125]  D. E. Dimla,et al.  Neural network solutions to the tool condition monitoring problem in metal cutting—A critical review of methods , 1997 .

[126]  Vesa T. Peltonen,et al.  Computational auditory scene recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[127]  Nikos Fakotakis,et al.  Automatic acoustic identification of crickets and cicadas , 2007, 2007 9th International Symposium on Signal Processing and Its Applications.

[128]  Dimitris K. Tasoulis,et al.  Generalized locally recurrent probabilistic neural networks for text-independent speaker verification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[129]  D. F. Specht,et al.  Probabilistic neural networks for classification, mapping, or associative memory , 1988, IEEE 1988 International Conference on Neural Networks.

[130]  P. Srinivasa Pai,et al.  Acoustic emission analysis for tool wear monitoring in face milling , 2002 .

[131]  T. Bayes An essay towards solving a problem in the doctrine of chances , 2003 .

[132]  Anssi Klapuri,et al.  Musical instrument recognition using cepstral coefficients and temporal features , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[133]  H. Gish,et al.  Text-independent speaker identification , 1994, IEEE Signal Processing Magazine.

[134]  Xavier Rodet,et al.  MUSICAL INSTRUMENT IDENTIFICATION IN CONTINUOUS RECORDINGS , 2004 .

[135]  Ying Li,et al.  SVM-based audio classification for instructional video analysis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[136]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[137]  Chin-Chuan Han,et al.  Automatic recognition of animal vocalizations using averaged MFCC and linear discriminant analysis , 2006, Pattern Recognit. Lett..

[138]  H C Bennet-Clark,et al.  Resonators in insect sound production: how insects produce loud pure-tone songs. , 1999, The Journal of experimental biology.

[139]  William S. Meisel,et al.  Computer-oriented approaches to pattern recognition , 1972 .

[140]  Fabien Gouyon,et al.  Automatic Classification of Drum Sounds: A Comparison of Feature Selection Methods and Classification Techniques , 2002, ICMAI.

[141]  Josef Kittler,et al.  Sum Versus Vote Fusion in Multiple Classifier Systems , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[142]  Rafid A. Sukkar,et al.  Correcting recognition errors via discriminative utterance verification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[143]  Parag Chordia,et al.  Segmentation and Recognition of Tabla Strokes , 2005, ISMIR.

[144]  S J Wilcox,et al.  Acoustic emission monitoring of tool wear during the face milling of steels and aluminium alloys using a fibre optic sensor. Part 2: Frequency analysis , 1997 .