论文信息 - Retrieving sounds by vocal imitation recognition

Retrieving sounds by vocal imitation recognition

Vocal imitation is widely used in human communication. In this paper, we propose an approach to automatically recognize the concept of a vocal imitation, and then retrieve sounds of this concept. Because different acoustic aspects (e.g., pitch, loudness, timbre) are emphasized in imitating different sounds, a key challenge in vocal imitation recognition is to extract appropriate features. Hand-crafted features may not work well for a large variety of imitations. Instead, we use a stacked auto-encoder to automatically learn features from a set of vocal imitations in an unsupervised way. Then, a multi-class SVM is trained for sound concepts of interest using their training imitations. Given a new vocal imitation of a sound concept of interest, our system can recognize its underlying concept and return it with a high rank among all concepts. Experiments show that our system significantly outperforms an MFCC-based comparison system in both classification and retrieval.

Zhiyao Duan | Yichi Zhang | Yichi Zhang | Z. Duan

[1] Wensheng Zhang,et al. A novel sparse auto-encoder for deep unsupervised learning , 2013, 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI).

[2] Bryan Pardo,et al. VocalSketch: Vocally Imitating Audio Concepts , 2015, CHI.

[3] Fuminori Kimura,et al. Music Retrieval Using Onomatopoeic Query , 2013 .

[4] Guillaume Lemaitre,et al. Vocal Imitations and the Identification of Sound Events , 2011 .

[5] Tetsuya Ogata,et al. Sound sources selection system by using onomatopoeic querries from multiple sound sources , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6] Jordi Janer,et al. Sound Retrieval From Voice Imitation Queries In Collaborative Databases , 2014, Semantic Audio.

[7] Shin-ichiro Iwamiya,et al. Comparisons of Auditory Impressions and Auditory Imagery Associated with Onomatopoeic Representation for Environmental Sounds , 2010, EURASIP J. Audio Speech Music. Process..

[8] Preeti Rao,et al. TANSEN: A QUERY-BY-HUMMING BASED MUSIC RETRIEVAL SYSTEM , 2003 .

[9] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[10] Shrikanth S. Narayanan,et al. Vector-based Representation and Clustering of Audio Using Onomatopoeia Words , 2006, AAAI Fall Symposium: Aurally Informed Performance.

[11] Lie Lu,et al. A new approach to query by humming in music retrieval , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[12] Shrikanth S. Narayanan,et al. Classification of sound clips by two schemes: Using onomatopoeia and semantic labels , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[13] Xavier Serra,et al. Querying Freesound with a microphone , 2015 .

[14] Christian Schörkhuber. CONSTANT-Q TRANSFORM TOOLBOX FOR MUSIC PROCESSING , 2010 .

[15] Honglak Lee,et al. Sparse deep belief net model for visual area V2 , 2007, NIPS.

[16] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.