Temporal ICA for classification of acoustic events i a kitchen environment

We describe a feature extraction method for general audio modeling using a temporal extension of Independent Component Analysis (ICA) and demonstrate its utility in the context of a sound classification task in a kitchen environment. Our approach accounts for temporal dependencies over multiple analysis frames much like the standard audio modeling technique of adding first and second temporal derivatives to the feature set. Using a real-world dataset of kitchen sounds, we show that our approach outperforms a canonical version of this standard front end, the mel-frequency cepstral coefficients (MFCCs), which has found successful application in automatic speech recognition tasks.

[1]  T J Sejnowski,et al.  Learning the higher-order structure of a natural sound. , 1996, Network.

[2]  Michael S. Lewicki,et al.  Efficient coding of natural sounds , 2002, Nature Neuroscience.

[3]  D. Chakrabarti,et al.  A fast fixed - point algorithm for independent component analysis , 1997 .

[4]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[5]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[6]  Ashish Kapoor,et al.  The audio epitome: a new representation for modeling and classifying auditory phenomena , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  Andrey Temko,et al.  Classification of meeting-room acoustic events with support vector machines and variable-feature-set clustering , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[8]  Eric Horvitz,et al.  Layered representations for human activity recognition , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[9]  Keansub Lee,et al.  Minimal-impact audio-based personal archives , 2004, CARPE'04.

[10]  Daniel P. W. Ellis,et al.  Selection, parameter estimation, and discriminative training of hidden Markov models for general audio modeling , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[11]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[12]  Brian Patrick Clarkson,et al.  Life patterns : structure from wearable sensors , 2002 .

[13]  Preeti Rao,et al.  AUDIO SIGNAL CLASSIFICATION , 2004 .

[14]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[15]  李幼升,et al.  Ph , 1989 .

[16]  Daniel P. W. Ellis,et al.  Features for segmenting and classifying long-duration recordings of "personal" audio , 2004, SAPA@INTERSPEECH.