Acoustic event detection and localization using distributed microphone arrays

Automatic acoustic scene analysis is a complex task that involves several functionalities: detection (time), localization (space), separation, recognition, etc. This thesis focuses on both acoustic event detection (AED) and acoustic source localization (ASL), when several sources may be simultaneously present in a room. In particular, the experimentation work is carried out with a meeting-room scenario. Unlike previous works that either employed models of all possible sound combinations or additionally used video signals, in this thesis, the time overlapping sound problem is tackled by exploiting the signal diversity that results from the usage of multiple microphone array beamformers. The core of this thesis work is a rather computationally efficient approach that consists of three processing stages. In the first, a set of (null) steering beamformers is used to carry out diverse partial signal separations, by using multiple arbitrarily located linear microphone arrays, each of them composed of a small number of microphones. In the second stage, each of the beamformer output goes through a classification step, which uses models for all the targeted sound classes (HMM-GMM, in the experiments). Then, in a third stage, the classifier scores, either being intra- or inter-array, are combined using a probabilistic criterion (like MAP) or a machine learning fusion technique (fuzzy integral (FI), in the experiments). The above-mentioned processing scheme is applied in this thesis to a set of complexity-increasing problems, which are defined by the assumptions made regarding identities (plus time endpoints) and/or positions of sounds. In fact, the thesis report starts with the problem of unambiguously mapping the identities to the positions, continues with AED (positions assumed) and ASL (identities assumed), and ends with the integration of AED and ASL in a single system, which does not need any assumption about identities or positions. The evaluation experiments are carried out in a meeting-room scenario, where two sources are temporally overlapped; one of them is always speech and the other is an acoustic event from a pre-defined set. Two different databases are used, one that is produced by merging signals actually recorded in the UPC?s department smart-room, and the other consists of overlapping sound signals directly recorded in the same room and in a rather spontaneous way. From the experimental results with a single array, it can be observed that the proposed detection system performs better than either the model based system or a blind source separation based system. Moreover, the product rule based combination and the FI based fusion of the scores resulting from the multiple arrays improve the accuracies further. On the other hand, the posterior position assignment is performed with a very small error rate. Regarding ASL and assuming an accurate AED system output, the 1-source localization performance of the proposed system is slightly better than that of the widely-used SRP-PHAT system, working in an event-based mode, and it even performs significantly better than the latter one in the more complex 2-source scenario. Finally, though the joint system suffers from a slight degradation in terms of classification accuracy with respect to the case where the source positions are known, it shows the advantage of carrying out the two tasks, recognition and localization, with a single system, and it allows the inclusion of information about the prior probabilities of the source positions. It is worth noticing also that, although the acoustic scenario used for experimentation is rather limited, the approach and its formalism were developed for a general case, where the number and identities of sources are not constrained.

[1]  Andrey Temko,et al.  Acoustic event detection in meeting-room environments , 2009, Pattern Recognit. Lett..

[2]  Chloé Clavel,et al.  Events Detection for an Audio-Based Surveillance System , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[3]  Taras Butko,et al.  Source ambiguity resolution of overlapped sounds in a multi-microphone room environment , 2014, EURASIP Journal on Audio, Speech, and Music Processing.

[4]  Knud Bank Christensen,et al.  The Application of Digital Signal Processing to Large-Scale Simulation of Room Acoustics: Frequency Response Modeling and Optimization Software for a Multichannel DSP Engine , 1992 .

[5]  Walter Kellermann,et al.  TRINICON: a versatile framework for multichannel blind signal processing , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Hong Wang,et al.  Voice source localization for automatic camera pointing system in videoconferencing , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[7]  David Pallett,et al.  A look at NIST'S benchmark ASR tests: past, present, and future , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[8]  Maurizio Omologo,et al.  Acoustic event localization using a crosspower-spectrum phase based technique , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Jordi Luque,et al.  Simultaneous Speech Detection With Spatial Features for Speaker Diarization , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Lie Lu,et al.  Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[11]  Andrzej Czyzewski,et al.  Detection and localization of selected acoustic events in acoustic field for smart surveillance applications , 2012, Multimedia Tools and Applications.

[12]  Shiqiang Yang,et al.  Motion based event recognition using HMM , 2002, Object recognition supported by user interaction for service robots.

[13]  R. C. Williamson,et al.  Theory and design of broadband sensor arrays with frequency invariant far‐field beam patterns , 1995 .

[14]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[15]  Qi Tian,et al.  A fusion scheme of visual and auditory modalities for event detection in sports video , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[16]  Tom Barker,et al.  Non-negative tensor factorisation of modulation spectrograms for monaural sound source separation , 2013, INTERSPEECH.

[17]  Renate Sitte,et al.  Analysis of Speech Recognition Techniques for use in a Non-Speech Sound Recognition System , 2002 .

[18]  Benesty,et al.  Adaptive eigenvalue decomposition algorithm for passive acoustic source localization , 2000, The Journal of the Acoustical Society of America.

[19]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[20]  Björn W. Schuller,et al.  Convolutive Non-Negative Sparse Coding and New Features for Speech Overlap Handling in Speaker Diarization , 2012, INTERSPEECH.

[21]  Carlos Segura Perales Speaker localization and orientation in multimodal smart environments , 2011 .

[22]  Bhiksha Raj,et al.  Calibration of microphone arrays for improved speech recognition , 2001, INTERSPEECH.

[23]  Julius O. Smith,et al.  Closed-form least-squares source location estimation from range-difference measurements , 1987, IEEE Trans. Acoust. Speech Signal Process..

[24]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[25]  Siow Yong Low,et al.  A Combined On-Line PCA-ICA Algorithm for Blind Source Separation , 2005, 2005 Asia-Pacific Conference on Communications.

[26]  Walter Kellermann,et al.  Real-Time Convolutive Blind Source Separation Based on a Broadband Approach , 2004, ICA.

[27]  Mostafa Kaveh,et al.  Broadband focusing for partially adaptive beamforming , 1994 .

[28]  Jacob Benesty,et al.  Real-time passive source localization: a practical linear-correction least-squares approach , 2001, IEEE Trans. Speech Audio Process..

[29]  Shuji Hashimoto,et al.  Multiple Signal Classification by Aggregated Microphones , 2005, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[30]  Ea‐Ea Jan Parallel processing of large scale microphone arrays for sound capture , 1996 .

[31]  Thomas S. Huang,et al.  Real-time lip tracking and bimodal continuous speech recognition , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[32]  Harvey F. Silverman,et al.  A Fast Microphone Array SRP-PHAT Source Location Implementation using Coarse-To-Fine Region Contraction(CFRC) , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[33]  Kung Yao,et al.  Source Localization and Tracking of a Wideband Source Using a Randomly Distributed Beamforming Sensor Array , 2002, Int. J. High Perform. Comput. Appl..

[34]  Julien Pinquier,et al.  Jingle detection and identification in audio documents , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[35]  Hong-Seok Kim,et al.  Performance of an HMM speech recognizer using a real-time tracking microphone array as input , 1999, IEEE Trans. Speech Audio Process..

[36]  Akihiko Sugiyama,et al.  Robust Adaptive Beamforming , 2001, Microphone Arrays.

[37]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[38]  Guy J. Brown,et al.  Mask estimation and imputation methods for missing data speech recognition in a multisource reverberant environment , 2013, Comput. Speech Lang..

[39]  Maja Pantic,et al.  Audiovisual discrimination between laughter and speech , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[40]  Julien Pinquier,et al.  Robust speech / music classification in audio documents , 2002, INTERSPEECH.

[41]  Sven Nordholm,et al.  Adaptive microphone array employing calibration signals: an analytical evaluation , 1999, IEEE Trans. Speech Audio Process..

[42]  Andrey Temko,et al.  Fuzzy integral based information fusion for classification of highly confusable non-speech sounds , 2008, Pattern Recognit..

[43]  Maurizio Omologo,et al.  Acoustic source location in a three-dimensional space using crosspower spectrum phase , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[44]  Gary W. Elko,et al.  Second-order differential adaptive microphone array , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[45]  Svetha Venkatesh,et al.  Detecting indexical signs in film audio for scene interpretation , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[46]  S. Applebaum,et al.  Adaptive arrays with main beam constraints , 1976 .

[47]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[48]  Dong Wang,et al.  Speech overlap detection and attribution using convolutive non-negative sparse coding , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[49]  Richard M. Stern,et al.  Subband Likelihood-Maximizing Beamforming for Speech Recognition in Reverberant Environments , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[50]  Gary W. Elko,et al.  FIRST- AND SECOND-ORDER ADAPTIVE DIFFERENTIAL MICROPHONE ARRAYS , 2001 .

[51]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[52]  Zhengyou Zhang,et al.  Maximum Likelihood Sound Source Localization and Beamforming for Directional Microphone Arrays in Distributed Meetings , 2008, IEEE Transactions on Multimedia.

[53]  Lucas C. Parra,et al.  Steerable frequency-invariant beamforming for arbitrary arrays , 2006 .

[54]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[55]  Zhongfu Ye,et al.  A Compressed Sensing Approach to Blind Separation of Speech Mixture Based on a Two-Layer Sparsity Model , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[56]  John H. L. Hansen,et al.  Analysis of CFA-BF: Novel combined fixed/adaptive beamforming for robust speech recognition in real car environments , 2010, Speech Commun..

[57]  Maurizio Omologo,et al.  Use of the crosspower-spectrum phase in acoustic event location , 1997, IEEE Trans. Speech Audio Process..

[58]  Andrzej Czyzewski,et al.  Detection and localization of selected acoustic events in acoustic field for smart surveillance applications , 2011, Multimedia Tools and Applications.

[59]  Raffaele Parisi,et al.  Multi-source localization in reverberant environments by ROOT-MUSIC and clustering , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[60]  Guy J. Brown,et al.  Speech and crosstalk detection in multichannel audio , 2005, IEEE Transactions on Speech and Audio Processing.

[61]  Jacob Benesty,et al.  On Microphone-Array Beamforming From a MIMO Acoustic Signal Processing Perspective , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[62]  Taras Butko,et al.  Audio segmentation of broadcast news: A hierarchical system with feature selection for the Albayzin-2010 evaluation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[63]  R. Schmidt A New Approach to Geometry of Range Difference Location , 1972, IEEE Transactions on Aerospace and Electronic Systems.

[64]  Eduardo Lleida,et al.  Broadcast News Segmentation with Factor Analysis System , 2013, SLAM@INTERSPEECH.

[65]  Thomas S. Huang,et al.  Improving faster-than-real-time human acoustic event detection by saliency-maximized audio visualization , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[66]  Javier Ortega-Garcia,et al.  A real-time auditory-based microphone array assessed with E-RASTI evaluation proposal , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[67]  Philippe Loubaton,et al.  Separation of a class of convolutive mixtures: a contrast function approach , 2001, Signal Process..

[68]  P. Dhanalakshmi,et al.  Classification of audio signals using SVM and RBFNN , 2009, Expert Syst. Appl..

[69]  A. Gareta,et al.  A multi-microphone approach to speech processing in a smart-room environment , 2007 .

[70]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[71]  Satoshi Nakamura,et al.  Environmental sound source identification based on hidden Markov model for robust speech recognition , 2003, INTERSPEECH.

[72]  Martin Wolf,et al.  Channel selection measures for multi-microphone speech recognition , 2014, Speech Commun..

[73]  L. Parra,et al.  Least squares frequency-invariant beamforming , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[74]  Gwo-Lang Yan,et al.  Activity Recognition by Detecting Acoustic Events for Eldercare , 2010 .

[75]  Climent Nadeu,et al.  Effect of head orientation on the speaker localization performance in smart-room environment , 2005, INTERSPEECH.

[76]  Daniel P. W. Ellis,et al.  Spectral vs. spectro-temporal features for acoustic event detection , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[77]  Daniel Patrick Whittlesey Ellis,et al.  Prediction-driven computational auditory scene analysis , 1996 .

[78]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[79]  Steven Greenberg,et al.  Syllable-proximity evaluation in automatic speech recognition using fuzzy measures and a fuzzy integral , 2003, The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03..

[80]  Harvey F. Silverman,et al.  A method for locating multiple sources from a frame of a large-aperture microphone array data without tracking , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[81]  Bhiksha Raj,et al.  Audio event detection from acoustic unit occurrence patterns , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[82]  Jie Huang,et al.  Real-Time Blind Source Separation Of Acoustic Signals With A Recursive Approach , 2004, Int. J. Comput. Intell. Appl..

[83]  Maurizio Omologo,et al.  Experiments of hands-free connected digit recognition using a microphone array , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[84]  Alvin Harvey Kam,et al.  An automatic acoustic bathroom monitoring system , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[85]  Marc Castella,et al.  A new optimization method for reference-based quadratic contrast functions in a deflation scenario , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[86]  Eduardo Lleida,et al.  Speech reinforcement system for car cabin communications , 2005, IEEE Transactions on Speech and Audio Processing.

[87]  K. U. Simmer,et al.  An alternative implementation of the superdirective beamformer , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[88]  M. Grabisch Fuzzy integral in multicriteria decision making , 1995 .

[89]  Jacob Benesty,et al.  A Generalized Steered Response Power Method for Computationally Viable Source Localization , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[90]  Annamaria Mesaros,et al.  Sound Event Detection in Multisource Environments Using Source Separation , 2011 .

[91]  Bhiksha Raj,et al.  Techniques for Noise Robustness in Automatic Speech Recognition , 2012, Techniques for Noise Robustness in Automatic Speech Recognition.

[92]  Benedikt Loesch,et al.  Online blind source separation based on time-frequency sparseness , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[93]  Akihiko Sugiyama,et al.  A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters , 1999, IEEE Trans. Signal Process..

[94]  Michael Shapiro Brandstein,et al.  A framework for speech source localization using sensor arrays , 1995 .

[95]  Wei Liu,et al.  Design of frequency invariant beamformers in subbands , 2009, 2009 IEEE/SP 15th Workshop on Statistical Signal Processing.

[96]  L. J. Griffiths,et al.  An alternative approach to linearly constrained adaptive beamforming , 1982 .

[97]  Nicole Vincent,et al.  A two level strategy for audio segmentation , 2011, Digit. Signal Process..

[98]  Walter Kellermann,et al.  Simultaneous localization of multiple sound sources using blind adaptive MIMO filtering , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[99]  Ying Yu,et al.  A Real-Time SRP-PHAT Source Location Implementation using Stochastic Region Contraction(SRC) on a Large-Aperture Microphone Array , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[100]  Björn W. Schuller,et al.  Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory , 2013, Comput. Speech Lang..

[101]  Ning Liu,et al.  Bathroom Activity Monitoring Based on Sound , 2005, Pervasive.

[102]  H. C. Schau,et al.  Passive source localization employing intersecting spherical surfaces from time-of-arrival differences , 1987, IEEE Trans. Acoust. Speech Signal Process..

[103]  Martin Wolf,et al.  Channel selection using n-best hypothesis for multi-microphone ASR , 2013, INTERSPEECH.

[104]  Douglas E. Sturim,et al.  Tracking multiple talkers using microphone-array measurements , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[105]  Liang Gu,et al.  Robust singing detection in speech/music discriminator design , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[106]  Taras Butko,et al.  Feature selection for multimodal: acoustic event detection , 2011 .

[107]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[108]  Kunio Kashino,et al.  A background music detection method based on robust feature extraction , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[109]  Michel Vacher,et al.  Smart Audio Sensor for Telemedicine , 2003 .

[110]  Ying Yu,et al.  An improved TDOA-based location estimation algorithm for large aperture microphone arrays , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[111]  Stefan Goetze,et al.  Detection and Classification of Acoustic Events for In-Home Care , 2011 .

[112]  Tran Huy Dat,et al.  Using Blob Detection in Missing Feature Linear-Frequency Cepstral Coefficients for Robust Sound Event Recognition , 2012, INTERSPEECH.

[113]  Satoshi Nakamura,et al.  Study of environmental sound source identification based on hidden Markov model for robust speech recognition , 2003 .

[114]  Gérard Faucon,et al.  Using the coherence function for noise reduction , 1992 .

[115]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[116]  Michael S. Brandstein,et al.  Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.

[117]  Darren Brett Ward Theory and application of broadband frequency invariant beamforming , 1996 .

[118]  Paul Lukowicz,et al.  SoundButton: design of a low power wearable audio classification system , 2003, Seventh IEEE International Symposium on Wearable Computers, 2003. Proceedings..

[119]  Andrey Temko,et al.  Acoustic Event Detection and Classification , 2007, Computers in the Human Interaction Loop.

[120]  Haizhou Li,et al.  Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions , 2011, IEEE Signal Processing Letters.

[121]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[122]  Larry S. Davis,et al.  Active speech source localization by a dual coarse-to-fine search , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[123]  Mark D. Plumbley,et al.  Multichannel HR-NMF for modelling convolutive mixtures of non-stationary signals in the time-frequency domain , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[124]  Jacob Benesty,et al.  Steered Beamforming Approaches for Acoustic Source Localization , 2010 .

[125]  Satoshi Nakamura,et al.  Localization of multiple sound sources based on a CSP analysis with a microphone array , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[126]  Björn W. Schuller,et al.  Semi-supervised learning helps in sound event classification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[127]  Zhu Liu,et al.  Integration of multimodal features for video scene classification based on HMM , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[128]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[129]  Chng Eng Siong,et al.  Image Feature Representation of the Subband Power Distribution for Robust Sound Event Classification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[130]  Mark D. Plumbley,et al.  INVESTIGATING SINGLE-CHANNEL AUDIO SOURCE SEPARATION METHODS BASED ON NON-NEGATIVE MATRIX FACTORIZATION , 2006 .

[131]  Jean-Christophe Pesquet,et al.  Quadratic Higher Order Criteria for Iterative Blind Separation of a MIMO Convolutive Mixture of Sources , 2007, IEEE Transactions on Signal Processing.

[132]  Richard M. Stern,et al.  Microphone array processing for robust speech recognition , 2003 .

[133]  Taras Butko,et al.  Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities , 2011, EURASIP J. Adv. Signal Process..

[134]  Yap-Peng Tan,et al.  Event detection using multimodal feature analysis , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[135]  Rainer Stiefelhagen,et al.  Computers in the Human Interaction Loop , 2009, Human-Computer Interaction Series.

[136]  Ea-Ee Jan,et al.  Sound source localization in reverberant environments using an outlier elimination algorithm , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[137]  Kennedy,et al.  Nearfield broadband array design using a radially invariant modal expansion , 2000, The Journal of the Acoustical Society of America.

[138]  Satoshi Nakamura,et al.  Design and collection of acoustic sound data for hands-free speech recognition and sound scene understanding , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[139]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[140]  Murat Akbacak,et al.  Indexing multimedia documents with acoustic concept recognition lattices , 2013, INTERSPEECH.

[141]  Javier Macias-Guarasa,et al.  Source Localization with Acoustic Sensor Arrays Using Generative Model Based Fitting with Sparse Constraints , 2012, Sensors (Basel, Switzerland).

[142]  Climent Nadeu,et al.  Joint recognition and direction-of-arrival estimation of simultaneous meeting-room acoustic events , 2013, INTERSPEECH.

[143]  Zhengyou Zhang,et al.  Why does PHAT work well in lownoise, reverberative environments? , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[144]  Climent Nadeu,et al.  Time and frequency filtering of filter-bank energies for robust HMM speech recognition , 2000, Speech Commun..

[145]  Dinh Tuan Pham,et al.  Separation of a mixture of independent sources through a maximum likelihood approach , 1992 .

[146]  Joseph H. DiBiase A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays , 2000 .

[147]  Martin Wolf,et al.  On the potential of channel selection for recognition of reverberated speech with multiple microphones , 2010, INTERSPEECH.

[148]  Norbert Strobel,et al.  Speaker Localization Using A Steered Filter-And-Sum Beamformer , 1999 .

[149]  Derek Hoiem,et al.  SOLAR: sound object localization and retrieval in complex audio environments , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[150]  L. Godara Application of antenna arrays to mobile communications. II. Beam-forming and direction-of-arrival considerations , 1997, Proc. IEEE.

[151]  Harriet J. Nock,et al.  Semantic indexing of multimedia using audio, text and visual cues , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[152]  Taras Butko,et al.  Detection and Positioning of Overlapped Sounds in a Room Environment , 2012, INTERSPEECH.

[153]  Hongwei Xu,et al.  Blind separation of speech sources in multichannel compressed sensing , 2013, 2013 IEEE International Instrumentation and Measurement Technology Conference (I2MTC).

[154]  Taras Butko,et al.  Two-source acoustic event detection and localization: Online implementation in a Smart-room , 2011, 2011 19th European Signal Processing Conference.

[155]  Jean-Francois Cardoso,et al.  Blind signal separation: statistical principles , 1998, Proc. IEEE.

[156]  Anoop Gupta,et al.  Automatically extracting highlights for TV Baseball programs , 2000, ACM Multimedia.

[157]  B.D. Van Veen,et al.  Beamforming: a versatile approach to spatial filtering , 1988, IEEE ASSP Magazine.

[158]  Richard M. Stern,et al.  Reconstruction of missing features for robust speech recognition , 2004, Speech Commun..

[159]  Harvey F. Silverman,et al.  Microphone array optimization by stochastic region contraction , 1991, IEEE Trans. Signal Process..

[160]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[161]  Lie Lu,et al.  A flexible framework for key audio effects detection and auditory context inference , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[162]  Wenwu Wang,et al.  Machine Audition: Principles, Algorithms and Systems , 2010 .

[163]  Bart Vanrumste,et al.  Automatic Monitoring of Activities of Daily Living based on Real-life Acoustic Sensor Data: a~preliminary study , 2013, SLPAT.

[164]  V. M. Alvarado,et al.  Talker Localization and Optimal Placement of Microphones for a Linear Microphone Array Using Stochastic Region Contraction. , 1990 .

[165]  Benoît Champagne,et al.  Performance of time-delay estimation in the presence of room reverberation , 1996, IEEE Trans. Speech Audio Process..

[166]  Walter Kellermann,et al.  A generalization of blind source separation algorithms for convolutive mixtures based on second-order statistics , 2005, IEEE Transactions on Speech and Audio Processing.

[167]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..

[168]  Climent Nadeu,et al.  Real-time multi-microphone recognition of simultaneous sounds in a room environment , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[169]  Ilias Maglogiannis,et al.  Emergency Fall Incidents Detection in Assisted Living Environments Utilizing Motion, Sound, and Visual Perceptual Components , 2011, IEEE Transactions on Information Technology in Biomedicine.

[170]  Hermann Ney,et al.  Dynamic programming search for continuous speech recognition , 1999, IEEE Signal Process. Mag..

[171]  Gerhard Rigoll,et al.  Action Recognition in Meeting Scenarios using Global Motion Features , 2003 .

[172]  Harvey F. Silverman,et al.  A two-stage algorithm for determining talker location from linear microphone array data , 1992 .

[173]  Min Xu,et al.  Multimodal Semantic Analysis and Annotation for Basketball Video , 2006, EURASIP J. Adv. Signal Process..

[174]  Chng Eng Siong,et al.  Overlapping sound event recognition using local spectrogram features and the generalised hough transform , 2013, Pattern Recognit. Lett..