Acoustic Topic Model for Scene Analysis With Intermittently Missing Observations

We propose a sophisticated method of acoustic scene analysis with intermittently missing observations, which analyzes acoustic scenes and restores missing observations simultaneously on the basis of the temporal correlation between acoustic words. One effective strategy for analyzing acoustic scenes is to characterize them as a combination of acoustic words. An acoustic topic model (ATM) is one of the techniques, which models the process generating multiple acoustic words. Here, an acoustic word corresponds to a sound category, while it has a homogenous time duration and is defined time frame by time frame. In the ATM, it is assumed that all acoustic words are observed, and therefore, it cannot be applied if any acoustic observations are missing. However, acoustic observations may sometimes be missing because of poor recording conditions, transmission loss, or privacy reasons. In the proposed method, focusing on the fact that acoustic words are temporally correlated, we consider the transition of acoustic words in two ways: First, by modeling the temporal transition of acoustic words directly using a Markov process and finally, by modeling the temporal transition of hidden states that generate acoustic words using a hidden Markov model. We then incorporate each transition model in a process generating acoustic words based on the ATM. The proposed method allows us to analyze acoustic scenes from acoustic words by restoring missing acoustic words. In our experiments, the proposed method exhibited a classification accuracy of acoustic scenes close to that for the case of no missing observations even when 50% of the observations were missing. Moreover, the model considering the hidden-state transition can classify acoustic scenes more accurately than the model considering the acoustic word transition directly.

[1]  Tuomas Virtanen,et al.  ACOUSTIC SCENE CLASSIFICATION USING CONVOLUTIONAL RECURRENT NEURAL NETWORKS , 2017 .

[2]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[3]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[4]  Reishi Kondo,et al.  Detection of anomaly acoustic scenes based on a temporal dissimilarity model , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Tuomas Virtanen,et al.  Acoustic event detection in real life recordings , 2010, 2010 18th European Signal Processing Conference.

[6]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[7]  Hirokazu Kameoka,et al.  Bayesian semi-supervised audio event transcription based on Markov indian buffet process , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Kyogu Lee,et al.  Convolutional Neural Networks with Binaural Representations and Background Subtraction for Acoustic Scene Classification , 2017, DCASE.

[9]  Mark D. Plumbley,et al.  Acoustic Scene Classification: Classifying environments from the sounds they produce , 2014, IEEE Signal Processing Magazine.

[10]  M. Al Masum Shaikh,et al.  Automatic Life-Logging: A novel approach to sense real-world activities by environmental sound cues and common sense , 2008, 2008 11th International Conference on Computer and Information Technology.

[11]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[12]  Tuomas Virtanen,et al.  Audio context recognition using audio event histograms , 2010, 2010 18th European Signal Processing Conference.

[13]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Daniel P. W. Ellis,et al.  Audio-Based Semantic Concept Classification for Consumer Video , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Suehiro Shimauchi,et al.  Acoustic Scene Analysis Based on Hierarchical Generative Model of Acoustic Event Sequence , 2016, IEICE Trans. Inf. Syst..

[16]  Nobutaka Ono,et al.  Spatial Cepstrum as a Spatial Feature Using a Distributed Microphone Array for Acoustic Scene Analysis , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Julien Pinquier,et al.  Water sound recognition based on physical models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Ching-Yung Lin,et al.  Healthcare audio event classification using Hidden Markov Models and Hierarchical Hidden Markov Models , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[19]  Tuomas Virtanen,et al.  A report on sound event detection with different binaural features , 2017, ArXiv.

[20]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Andrey Temko,et al.  CLEAR Evaluation of Acoustic Event Detection and Classification Systems , 2006, CLEAR.

[22]  Nikos Fakotakis,et al.  On acoustic surveillance of hazardous situations , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[24]  Janto Skowronek,et al.  Automatic surveillance of the acoustic activity in our living environment , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[25]  Jörn Anemüller,et al.  Classification of human cough signals using spectro-temporal Gabor filterbank features , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Nobutaka Ono,et al.  Acoustic scene analysis from acoustic event sequence with intermittent missing event , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[28]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[29]  Florian Metze,et al.  Event-based Video Retrieval Using Audio , 2012, INTERSPEECH.

[30]  Alexander G. Hauptmann,et al.  Temporal localization of audio events for conflict monitoring in social media , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Yasunori Ohishi,et al.  Acoustic scene analysis based on latent acoustic topic and event allocation , 2013, 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[32]  R. Radhakrishnan,et al.  Audio analysis for surveillance applications , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[33]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[34]  Shrikanth S. Narayanan,et al.  Acoustic topic model for audio information retrieval , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.