Spatial Cepstrum as a Spatial Feature Using a Distributed Microphone Array for Acoustic Scene Analysis

In this paper, with the aim of using the spatial information obtained from a distributed microphone array employed for acoustic scene analysis, we propose a robust and efficient method, which is called the spatial cepstrum. In our approach, similarly to the cepstrum, which is widely used as a spectral feature, the logarithm of the amplitude in multichannel observation is converted to a feature vector by a linear orthogonal transformation. This linear orthogonal transformation is achieved by principal component analysis (PCA) in general. Moreover, we also show that for a circularly symmetric microphone arrangement with an isotropic sound field, PCA is identical to the inverse discrete Fourier transform and the spatial cepstrum exactly corresponds to the cepstrum. The proposed approach does not require the positions of the microphones and is robust against the synchronization mismatch of channels, thus ensuring its suitability for use with a distributed microphone array. Experimental results obtained using actual environmental sounds verify the validity of our approach even when a smaller feature dimension than the original one is used, which is achieved by dimensionality reduction through PCA. Additionally, experimental results also indicate that the robustness of the proposed method is satisfactory for observations that have the synchronization mismatch of channels.

[1]  Janto Skowronek,et al.  Automatic surveillance of the acoustic activity in our living environment , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[2]  Hirokazu Kameoka,et al.  Bayesian semi-supervised audio event transcription based on Markov indian buffet process , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Richard M. Stern,et al.  Efficient Cepstral Normalization for Robust Speech Recognition , 1993, HLT.

[4]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Dan Stowell,et al.  Detection and classification of acoustic scenes and events: An IEEE AASP challenge , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[6]  Suehiro Shimauchi,et al.  User activity estimation method based on probabilistic generative model of acoustic event sequence with user activity and its subordinate categories , 2013, INTERSPEECH.

[7]  Daniel P. W. Ellis,et al.  Spectral vs. spectro-temporal features for acoustic event detection , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[8]  Nikos Fakotakis,et al.  On acoustic surveillance of hazardous situations , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Dan Stowell,et al.  Acoustic Scene Classification: Classifying environments from the sounds they produce , 2014, IEEE Signal Processing Magazine.

[10]  Andrey Temko,et al.  CLEAR Evaluation of Acoustic Event Detection and Classification Systems , 2006, CLEAR.

[11]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[12]  Petros Maragos,et al.  Multi-room speech activity detection using a distributed microphone network in domestic environments , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[13]  Benjamin Cauchi,et al.  Non-Negative Matrix Factorization Applied to Auditory Scenes Classification , 2011 .

[14]  Florian Metze,et al.  Event-based Video Retrieval Using Audio , 2012, INTERSPEECH.

[15]  Visar Berisha,et al.  A sensor network for real-time acoustic scene analysis , 2009, 2009 IEEE International Symposium on Circuits and Systems.

[16]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Yasunori Ohishi,et al.  Acoustic scene analysis based on latent acoustic topic and event allocation , 2013, 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[18]  R. Radhakrishnan,et al.  Audio analysis for surveillance applications , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[19]  Reinhold Häb-Umbach,et al.  Sampling rate synchronisation in acoustic sensor networks with a pre-trained clock skew error model , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[20]  Gernot A. Fink,et al.  BAG-OF-FEATURES ACOUSTIC EVENT DETECTION FOR SENSOR NETWORKS , 2016 .

[21]  Shigeki Sagayama,et al.  Blind Estimation of Locations and Time Offsets for Distributed Recording Devices , 2010, LVA/ICA.

[22]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[23]  Nobutaka Ito,et al.  Diffuse Noise Suppression Using Crystal-Shaped Microphone Arrays , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Nobutaka Ito,et al.  Blind alignment of asynchronously recorded signals for distributed microphone array , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[25]  Shoji Makino,et al.  Blind compensation of interchannel sampling frequency mismatch for ad hoc microphone array based on maximum likelihood estimation , 2015, Signal Process..

[26]  Mark D. Plumbley,et al.  Acoustic Scene Classification: Classifying environments from the sounds they produce , 2014, IEEE Signal Processing Magazine.

[27]  Olli Viikki,et al.  Cepstral domain segmental feature vector normalization for noise robust speech recognition , 1998, Speech Commun..

[28]  Tuomas Virtanen,et al.  Acoustic event detection in real life recordings , 2010, 2010 18th European Signal Processing Conference.

[29]  Nobutaka Ono,et al.  Spatial-feature-based acoustic scene analysis using distributed microphone array , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[30]  Shrikanth S. Narayanan,et al.  Acoustic topic model for audio information retrieval , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[31]  D. Dubois,et al.  A cognitive approach to urban soundscapes : Using verbal data to access everyday life auditory categories , 2006 .

[32]  Julien Pinquier,et al.  Water sound recognition based on physical models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Shigeki Sagayama,et al.  Isotropic Noise Suppression in the Power Spectrum Domain by Symmetric Microphone Arrays , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[34]  Gene H. Golub,et al.  Matrix computations , 1983 .

[35]  Zicheng Liu SOUND SOURCE SEPARATION WITH DISTRIBUTED MICROPHONE ARRAYS IN THE PRESENCE OF CLOCK SYNCHRONIZATION ERRORS , 2008 .

[36]  Nobutaka Ono,et al.  Acoustic scene analysis from acoustic event sequence with intermittent missing event , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  Ching-Yung Lin,et al.  Healthcare audio event classification using Hidden Markov Models and Hierarchical Hidden Markov Models , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[38]  Rainer Lienhart,et al.  Position calibration of microphones and loudspeakers in distributed computing platforms , 2005, IEEE Transactions on Speech and Audio Processing.

[39]  Taras Butko,et al.  Audio segmentation of broadcast news in the Albayzin-2010 evaluation: overview, results, and discussion , 2011, EURASIP J. Audio Speech Music. Process..

[40]  Huy Phan,et al.  A multi-channel fusion framework for audio event detection , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).