An Overview of Audio Event Detection Methods from Feature Extraction to Classification

ABSTRACT Audio streams, such as news broadcasting, meeting rooms, and special video comprise sound from an extensive variety of sources. The detection of audio events including speech, coughing, gunshots, etc. leads to intelligent audio event detection (AED). With substantial attention geared to AED for various types of applications, such as security, speech recognition, speaker recognition, home care, and health monitoring, scientists are now more motivated to perform extensive research on AED. The deployment of AED is actually a more complicated task when going beyond exclusively highlighting audio events in terms of feature extraction and classification in order to select the best features with high detection accuracy. To date, a wide range of different detection systems based on intelligent techniques have been utilized to create machine learning-based audio event detection schemes. Nevertheless, the preview study does not encompass any state-of-the-art reviews of the proficiency and significances of such methods for resolving audio event detection matters. The major contribution of this work entails reviewing and categorizing existing AED schemes into preprocessing, feature extraction, and classification methods. The importance of the algorithms and methodologies and their proficiency and restriction are additionally analyzed in this study. This research is expanded by critically comparing audio detection methods and algorithms according to accuracy and false alarms using different types of datasets.

[1]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[2]  Hynek Hermansky,et al.  Multi-layer perceptron based speech activity detection for speaker verification , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[3]  Vandana,et al.  Survey of Nearest Neighbor Techniques , 2010, ArXiv.

[4]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Dong-Chul Park,et al.  Classification of audio signals using Fuzzy c-Means with divergence-based Kernel , 2009, Pattern Recognit. Lett..

[6]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[7]  Anil K. Jain,et al.  NOTE ON DISTANCE-WEIGHTED k-NEAREST NEIGHBOR RULES. , 1978 .

[8]  Khalid Daoudi,et al.  Dynamic Bayesian networks for multi-band automatic speech recognition , 2003, Comput. Speech Lang..

[9]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[10]  Stefan Goetze,et al.  Detection and Classification of Acoustic Events for In-Home Care , 2011 .

[11]  Adel W. Sadek,et al.  A k Nearest Neighbor based Local Linear Wavelet Neural Network Model for On-line Short-term Traffic Volume Prediction , 2013 .

[12]  Changsheng Xu,et al.  A Generic Framework for Video Annotation via Semi-Supervised Learning , 2012, IEEE Transactions on Multimedia.

[13]  Heloisa A. Camargo,et al.  On the estimation of the number of fuzzy sets for fuzzy rule-based classification systems , 2011, 2011 11th International Conference on Hybrid Intelligent Systems (HIS).

[14]  T. Lobos,et al.  Automated classification of power-quality disturbances using SVM and RBF networks , 2006, IEEE Transactions on Power Delivery.

[15]  Thomas Drugman,et al.  Using mutual information in supervised temporal event detection: Application to cough detection , 2014, Biomed. Signal Process. Control..

[16]  Soosan Beheshti,et al.  Speech recognition from adaptive windowing PSD estimation , 2011, 2011 24th Canadian Conference on Electrical and Computer Engineering(CCECE).

[17]  Theodoros Giannakopoulos,et al.  Chapter 4 – Audio Features , 2014 .

[18]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[19]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[20]  Chin-Chuan Han,et al.  Automatic recognition of animal vocalizations using averaged MFCC and linear discriminant analysis , 2006, Pattern Recognit. Lett..

[21]  Jian Li,et al.  Reducing the Overfitting of Adaboost by Controlling its Data Distribution Skewness , 2006, Int. J. Pattern Recognit. Artif. Intell..

[22]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[23]  Douglas E. Sturim,et al.  Automatic Detection of Depression in Speech Using Gaussian Mixture Modeling with Factor Analysis , 2011, INTERSPEECH.

[24]  Aaron E. Rosenberg,et al.  An improved endpoint detector for isolated word recognition , 1981 .

[25]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[26]  Hauke Schramm,et al.  Boosting HMM acoustic models in large vocabulary speech recognition , 2006, Speech Commun..

[27]  Mukesh A. Zaveri,et al.  Multi-scale Speaker Transformation Using Radial Basis Function , 2013 .

[28]  Goujun Lu,et al.  Indexing and Retrieval of Audio: A Survey , 2001, Multimedia Tools and Applications.

[29]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[30]  Stephen Grossberg,et al.  Adaptive pattern classification and universal recoding: II. Feedback, expectation, olfaction, illusions , 1976, Biological Cybernetics.

[31]  Andrey Temko,et al.  Classification of acoustic events using SVM-based clustering schemes , 2006, Pattern Recognit..

[32]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[33]  Charles Elkan,et al.  Fast recognition of musical genres using RBF networks , 2005, IEEE Transactions on Knowledge and Data Engineering.

[34]  Rainer Martin,et al.  Classification of reverberant audio signals using clustered ad hoc distributed microphones , 2015, Signal Process..

[35]  Tomi Kinnunen,et al.  Speaker Verification with Adaptive Spectral Subband Centroids , 2007, ICB.

[36]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[37]  Henry Leung,et al.  Classification of audio radar signals using radial basis function neural networks , 2003, IEEE Trans. Instrum. Meas..

[38]  Sergios Theodoridis,et al.  A Multi-Class Audio Classification Method With Respect To Violent Content In Movies Using Bayesian Networks , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[39]  Qiang Huang,et al.  Inferring the Structure of a Tennis Game Using Audio Information , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[40]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[41]  Ishwar K. Sethi,et al.  Classification of general audio data for content-based retrieval , 2001, Pattern Recognit. Lett..

[42]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[43]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[44]  Joon-Hyuk Chang,et al.  On using acoustic environment classification for statistical model-based speech enhancement , 2012, Speech Commun..

[45]  P. Dhanalakshmi,et al.  Pattern classification models for classifying and indexing audio signals , 2011, Eng. Appl. Artif. Intell..

[46]  Chidchanok Lursinsap,et al.  Very short time environmental sound classification based on spectrogram pattern matching , 2013, Inf. Sci..

[47]  Sergios Theodoridis,et al.  Violence Content Classification Using Audio Features , 2006, SETN.

[48]  Zhi-Hua Zhou,et al.  New Semi-Supervised Classification Method Based on Modified Cluster Assumption , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[49]  L. M. Nithya,et al.  A Survey on Semi-Supervised Learning Techniques , 2014, ArXiv.

[50]  Anne M. P. Canuto,et al.  Applying semi-supervised learning in hierarchical multi-label classification , 2014, Expert Syst. Appl..

[51]  Li Zhu,et al.  Speaker Recognition System Based on weighted feature parameter , 2012 .

[52]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[53]  Chenn-Jung Huang,et al.  Frog classification using machine learning techniques , 2009, Expert Syst. Appl..

[54]  Chung-Hsien Wu,et al.  Multiple change-point audio segmentation and classification using an MDL-based Gaussian model , 2006, IEEE Trans. Speech Audio Process..

[55]  Andreas Rauber,et al.  Analytic Comparison of Self-Organising Maps , 2009, WSOM.

[56]  Plamen J. Prodanov,et al.  Bayesian networks based multi-modality fusion for error handling in human-robot dialogues under noisy conditions , 2005, Speech Commun..

[57]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[58]  Jan Schlüter,et al.  Learning to Pinpoint Singing Voice from Weakly Labeled Examples , 2016, ISMIR.

[59]  Izabela Rojek,et al.  Hybrid Artificial Intelligence System in Constraint Based Scheduling of Integrated Manufacturing ERP Systems , 2012, HAIS.

[60]  Quan Pan,et al.  A new belief-based K-nearest neighbor classification method , 2013, Pattern Recognit..

[61]  Shashi Sharma,et al.  Comparative Study of K-means and Robust Clustering , 2013 .

[62]  E Tsunoo,et al.  Beyond Timbral Statistics: Improving Music Classification Using Percussive Patterns and Bass Lines , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[63]  Lei Wang,et al.  AdaBoost with SVM-based component classifiers , 2008, Eng. Appl. Artif. Intell..

[64]  Xiaofeng Wang,et al.  Ice hockey shooting event modeling with mixture hidden Markov model , 2010, Multimedia Tools and Applications.

[65]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[66]  L. Zadeh Fuzzy sets and their application to pattern classification and clustering analysis , 1996 .

[67]  Alexei Vinogradov,et al.  A real-time approach to acoustic emission clustering , 2013 .

[68]  Tom Michael Mitchell,et al.  The Role of Unlabeled Data in Supervised Learning , 2004 .

[69]  Sridha Sridharan,et al.  Multiple cameras for audio-visual speech recognition in an automotive environment , 2013, Comput. Speech Lang..

[70]  Ben Reaves Comments on 'An improved endpoint detector for isolated word recognition' , 1991, IEEE Trans. Signal Process..

[71]  Y. Zigel,et al.  Automatic Detection of Whole Night Snoring Events Using Non-Contact Microphone , 2013, PloS one.

[72]  Michael Georgiopoulos,et al.  Classification of noisy signal using fuzzy ARTMAP neural networks , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[73]  Carlos Busso,et al.  Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[74]  John Hallam,et al.  Semi-automatic long-term acoustic surveying: A case study with bats , 2014, Ecol. Informatics.

[75]  T. Kohonen Analysis of a simple self-organizing process , 1982, Biological Cybernetics.

[76]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[77]  Vasant Honavar,et al.  Learn++: an incremental learning algorithm for supervised neural networks , 2001, IEEE Trans. Syst. Man Cybern. Part C.

[78]  P. Dhanalakshmi,et al.  Classification of audio signals using AANN and GMM , 2011, Appl. Soft Comput..

[79]  Vikramjit Mitra,et al.  Content based audio classification: a neural network approach , 2008, Soft Comput..

[80]  Ye Tian,et al.  Nonspeech segment rejection based on prosodic information for robust speech recognition , 2002, IEEE Signal Processing Letters.

[81]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[82]  Masakiyo Fujimoto,et al.  Exploiting spectro-temporal locality in deep learning based acoustic event detection , 2015, EURASIP J. Audio Speech Music. Process..

[83]  Jesús Alcalá-Fdez,et al.  A Fuzzy Association Rule-Based Classification Model for High-Dimensional Problems With Genetic Rule Selection and Lateral Tuning , 2011, IEEE Transactions on Fuzzy Systems.

[84]  David A. Clifton,et al.  A review of novelty detection , 2014, Signal Process..

[85]  P. Dhanalakshmi,et al.  Classification of audio signals using SVM and RBFNN , 2009, Expert Syst. Appl..

[86]  Frank Kurth,et al.  Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring , 2010, Pattern Recognit. Lett..

[87]  Ioannis B. Theocharis,et al.  A hierarchical genetic fuzzy rule-based classifier for high-dimensional classification problems , 2011, 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011).

[88]  Nils J. Nilsson,et al.  Learning Machines: Foundations of Trainable Pattern-Classifying Systems , 1965 .

[89]  Anne H. H. Ngu,et al.  Towards Effective Content-Based Music Retrieval With Multiple Acoustic Feature Combination , 2006, IEEE Transactions on Multimedia.

[90]  Ching-Hua Chuan,et al.  Audio Classification and Retrieval Using Wavelets and Gaussian Mixture Models , 2013, Int. J. Multim. Data Eng. Manag..

[91]  Jieping Ye,et al.  Discriminant Analysis for Dimensionality Reduction: An Overview of Recent Developments , 2010 .

[92]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[93]  Hisao Ishibuchi,et al.  Adaptive fuzzy rule-based classification systems , 1996, IEEE Trans. Fuzzy Syst..

[94]  Liqiang Ji,et al.  A call-independent and automatic acoustic system for the individual recognition of animals: A novel model using four passerines , 2010, Pattern Recognit..

[95]  Klaus Riede,et al.  Automatic bird sound detection in long real-field recordings: Applications and tools , 2014 .

[96]  Jerry M. Mendel,et al.  Classification of Battlefield Ground Vehicles Using Acoustic Features and Fuzzy Logic Rule-Based Classifiers , 2007, IEEE Transactions on Fuzzy Systems.

[97]  Yong Wang,et al.  Feature extraction using a fast null space based linear discriminant analysis algorithm , 2012, Inf. Sci..

[98]  Syed Zubair,et al.  Dictionary learning based sparse coefficients for audio classification with max and average pooling , 2013, Digit. Signal Process..

[99]  Gökhan Tür,et al.  Multi-View Semi-Supervised Learning for Dialog Act Segmentation of Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[100]  Wei Liang,et al.  Acoustic detection technology for gas pipeline leakage. , 2013 .

[101]  Robert I. Damper,et al.  Improving speaker identification in noise by subband processing and decision fusion , 2003, Pattern Recognit. Lett..

[102]  Stephen Grossberg,et al.  ARTMAP: supervised real-time learning and classification of nonstationary data by a self-organizing neural network , 1991, [1991 Proceedings] IEEE Conference on Neural Networks for Ocean Engineering.

[103]  Shichao Zhang,et al.  Noisy data elimination using mutual k-nearest neighbor for classification mining , 2012, J. Syst. Softw..

[104]  Yonghong Yan,et al.  Detecting cheering events in sports games , 2010, 2010 2nd International Conference on Education Technology and Computer.

[105]  Hanseok Ko,et al.  Acoustic signal based abnormal event detection system with multiclass adaboost , 2013, 2013 IEEE International Conference on Consumer Electronics (ICCE).

[106]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[107]  Muhammad Ghulam,et al.  Pathological voice detection and binary classification using MPEG-7 audio features , 2014, Biomed. Signal Process. Control..

[108]  V. Kvasnicka,et al.  Neural and Adaptive Systems: Fundamentals Through Simulations , 2001, IEEE Trans. Neural Networks.

[109]  Isabel Trancoso,et al.  Hierarchical Clustering Experiments for Application to Audio Event Detection , 2009 .

[110]  Ronald G. Driggers,et al.  Encyclopedia of optical engineering , 2003 .

[111]  Amit Ganatra,et al.  A Comparative Study of Training Algorithms for Supervised Machine Learning , 2012 .

[112]  Luiz Eduardo Soares de Oliveira,et al.  Music genre classification using LBP textural features , 2012, Signal Process..

[113]  Jean-François Bonastre,et al.  Localization and selection of speaker-specific information with statistical modeling , 2000, Speech Commun..

[114]  Rong Tong,et al.  Spoken Language Recognition Using Ensemble Classifiers , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[115]  Yan Song,et al.  Robust Sound Event Classification Using Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[116]  Mohan S. Kankanhalli,et al.  Audio Based Event Detection for Multimedia Surveillance , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[117]  Saeed Balochian,et al.  Neural Network Optimization by Genetic Algorithms for the Audio Classification to Speech and Music , 2013 .

[118]  Simon Bernard,et al.  Random Forest Classifiers : A Survey and Future Research Directions , 2013 .

[119]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[120]  Nicolás Ruiz-Reyes,et al.  Adaptive network-based fuzzy inference system vs. other classification algorithms for warped LPC-based speech/music discrimination , 2007, Eng. Appl. Artif. Intell..

[121]  R. Sathya,et al.  Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification , 2013 .

[122]  Nurettin Acir,et al.  Automatic classification of auditory brainstem responses using SVM-based feature selection algorithm for threshold detection , 2006, Eng. Appl. Artif. Intell..

[123]  G. Karypis,et al.  Criterion Functions for Document Clustering ∗ Experiments and Analysis , 2001 .

[124]  ASHOK K. AGRAWALA,et al.  Learning with a probabilistic teacher , 1970, IEEE Trans. Inf. Theory.

[125]  Parham Zolfaghari,et al.  Formant analysis using mixtures of Gaussians , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[126]  Geoffrey Zweig,et al.  Bayesian network structures and inference techniques for automatic speech recognition , 2003, Comput. Speech Lang..

[127]  Ye Tian,et al.  Nonspeech segment rejection based on prosodic information for robust speech recognition , 2002 .

[128]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[129]  Gang Liu,et al.  Semi-supervised learning for automatic audio events annotation using TSVM , 2010, 2010 International Conference on Computer Application and System Modeling (ICCASM 2010).

[130]  Nicolás Ruiz-Reyes,et al.  Two-stage cascaded classification approach based on genetic fuzzy learning for speech/music discrimination , 2010, Eng. Appl. Artif. Intell..

[131]  Tetsuya Takiguchi,et al.  Event Detection and Recognition Using HMM with Whistle Sounds , 2013, 2013 International Conference on Signal-Image Technology & Internet-Based Systems.

[132]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[133]  A. M. Kalteh,et al.  Review of the self-organizing map (SOM) approach in water resources: Analysis, modelling and application , 2008, Environ. Model. Softw..

[134]  Maria E. Niessen,et al.  Hierarchical modeling using automated sub-clustering for sound event recognition , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[135]  Uday B. Desai,et al.  An Optimum RBF Network for Signal Detection in Non-Gaussian Noise , 2005, PReMI.

[136]  Francisco Herrera,et al.  On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification , 2014, Neurocomputing.

[137]  Haifeng Li,et al.  Confirmation Based Self-Learning Algorithm in LVCSR's Semi-supervised Incremental Learning , 2012 .

[138]  Heikki Huttunen,et al.  Recurrent neural networks for polyphonic sound event detection in real life recordings , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[139]  Bhiksha Raj,et al.  Audio Event Detection using Weakly Labeled Data , 2016, ACM Multimedia.

[140]  María José del Jesús,et al.  A proposal on reasoning methods in fuzzy rule-based classification systems , 1999, Int. J. Approx. Reason..

[141]  Erik J. Scheme,et al.  Myoelectric Signal Classification for Phoneme-Based Speech Recognition , 2007, IEEE Transactions on Biomedical Engineering.

[142]  Yilong Yin,et al.  Semi-supervised Gait Recognition Based on Self-Training , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[143]  Nicola Orio,et al.  Automatic identification of audio recordings based on statistical modeling , 2010, Signal Process..

[144]  J. Buckley,et al.  Fuzzy genetic algorithm and applications , 1994 .

[145]  Diego H. Milone,et al.  Automatic recognition of ingestive sounds of cattle based on hidden Markov models , 2012, Computers and Electronics in Agriculture.

[146]  Andrey Temko,et al.  Fuzzy integral based information fusion for classification of highly confusable non-speech sounds , 2008, Pattern Recognit..

[147]  Heikki Huttunen,et al.  Recognition of acoustic events using deep neural networks , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[148]  Lior Rokach,et al.  Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography , 2009, Comput. Stat. Data Anal..

[149]  Mark A. Hall,et al.  A decision tree-based attribute weighting filter for naive Bayes , 2006, Knowl. Based Syst..

[150]  Nicole Vincent,et al.  A two level strategy for audio segmentation , 2011, Digit. Signal Process..

[151]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[152]  Shivani Agarwal,et al.  An Experimental Study of EM-Based Algorithms for Semi-Supervised Learning in Audio Classification , 2003 .

[153]  Aristidis Likas,et al.  The global kernel k-means clustering algorithm , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[154]  Changshui Zhang,et al.  Content-Based Information Fusion for Semi-Supervised Music Genre Classification , 2008, IEEE Transactions on Multimedia.

[155]  Janelle J. Harms,et al.  Distributed classification of acoustic targets in wireless audio-sensor networks , 2008, Comput. Networks.

[156]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[157]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[158]  Chin-Wang Tao,et al.  A reduction approach for fuzzy rule bases of fuzzy controllers , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[159]  Nicu Sebe,et al.  Learning Bayesian network classifiers for facial expression recognition both labeled and unlabeled data , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[160]  Heikki Huttunen,et al.  Polyphonic sound event detection using multi label deep neural networks , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[161]  Jing Huang,et al.  Multi-View and Multi-Objective Semi-Supervised Learning for HMM-Based Automatic Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[162]  Min Zhang,et al.  Audio Event Change Detection and Clustering in Movies , 2013, J. Multim..

[163]  Ching-Yung Lin,et al.  Healthcare audio event classification using Hidden Markov Models and Hierarchical Hidden Markov Models , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[164]  Wei-Yu Chen,et al.  Transition effect detection for extracting highlights in baseball videos , 2013, EURASIP J. Image Video Process..

[165]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[166]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[167]  LiHaizhou,et al.  An overview of text-independent speaker recognition , 2010 .

[168]  Tomi Kinnunen,et al.  Comparison of clustering methods: A case study of text-independent speaker modeling , 2011, Pattern Recognit. Lett..

[169]  Li Shi-qiang Design and Implementation of a Audio Classification System Based on SVM , 2010 .

[170]  Pavel Matejka,et al.  Hierarchical Structures of Neural Networks for Phoneme Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[171]  Ioannis Pitas,et al.  A neural network approach to audio-assisted movie dialogue detection , 2007, Neurocomputing.

[172]  Yaochu Jin,et al.  Fuzzy modeling of high-dimensional systems: complexity reduction and interpretability improvement , 2000, IEEE Trans. Fuzzy Syst..

[173]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[174]  Joakim Gustafson,et al.  Semi-supervised methods for exploring the acoustics of simple productive feedback , 2013, Speech Commun..

[175]  Robert P. W. Duin,et al.  Bagging, Boosting and the Random Subspace Method for Linear Classifiers , 2002, Pattern Analysis & Applications.

[176]  Trieu-Kien Truong,et al.  Segmentation of specific speech signals from multi-dialog environment using SVM and wavelet , 2007, Pattern Recognit. Lett..

[177]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[178]  Haizhou Li,et al.  Scream detection for home applications , 2010, 2010 5th IEEE Conference on Industrial Electronics and Applications.

[179]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[180]  Aurelio Uncini,et al.  Audio signal processing by neural networks , 2003, Neurocomputing.

[181]  Andrey Temko,et al.  Acoustic event detection in meeting-room environments , 2009, Pattern Recognit. Lett..

[182]  Michael Arnold Subjective and objective quality evaluation of watermarked audio tracks , 2002, Second International Conference on Web Delivering of Music, 2002. WEDELMUSIC 2002. Proceedings..

[183]  Chloé Clavel,et al.  Events Detection for an Audio-Based Surveillance System , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[184]  Lie Lu,et al.  Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[185]  Mark Hasegawa-Johnson,et al.  Acoustic fall detection using Gaussian mixture models and GMM supervectors , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[186]  R. Radhakrishnan,et al.  Audio analysis for surveillance applications , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..