An Ontological Framework for Retrieving Environmental Sounds Using Semantics and Acoustic Content

Organizing a database of user-contributed environmental sound recordings allows sound files to be linked not only by the semantic tags and labels applied to them, but also to other sounds with similar acoustic characteristics. Of paramount importance in navigating these databases are the problems of retrieving similar sounds using text- or sound-based queries, and automatically annotating unlabeled sounds. We propose an integrated system, which can be used for text-based retrieval of unlabeled audio, content-based query-by-example, and automatic annotation of unlabeled sound files. To this end, we introduce an ontological framework where sounds are connected to each other based on the similarity between acoustic features specifically adapted to environmental sounds, while semantic tags and sounds are connected through link weights that are optimized based on user-provided tags. Furthermore, tags are linked to each other through a measure of semantic similarity, which allows for efficient incorporation of out-of-vocabulary tags, that is, tags that do not yet exist in the database. Results on two freely available databases of environmental sounds contributed and labeled by nonexpert users demonstrate effective recall, precision, and average precision scores for both the text-based retrieval and annotation tasks.

[1]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[2]  Takenobu Tokunaga,et al.  The Use of WordNet in Information Retrieval , 1998, WordNet@ACL/COLING.

[3]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[4]  Andreas Spanias,et al.  Segmentation, Indexing, and Retrieval for Environmental and Natural Sounds , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  R. Fay Acoustic Communication , 2003, Springer Handbook of Auditory Research.

[6]  Andreas Spanias,et al.  Unifying semantic and content-based approaches for retrieval of environmental sounds , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[7]  Shrikanth S. Narayanan,et al.  Acoustic topic model for audio information retrieval , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[8]  Daniel P. W. Ellis,et al.  Automatic Record Reviews , 2004, ISMIR.

[9]  Daniel P. W. Ellis,et al.  Please Scroll down for Article Journal of New Music Research a Web-based Game for Collecting Music Metadata a Web-based Game for Collecting Music Metadata , 2022 .

[10]  L. R. Rabiner,et al.  Some properties of continuous hidden Markov model representations , 1985, AT&T Technical Journal.

[11]  Markus Koppenberger,et al.  Nearest-neighbor Generic Sound Classification with a WordNet-based Taxonomy , 2004 .

[12]  Gert R. G. Lanckriet,et al.  Semantic Annotation and Retrieval of Music and Sound Effects , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[14]  L. R. Rabiner,et al.  A probabilistic distance measure for hidden Markov models , 1985, AT&T Technical Journal.

[15]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[16]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[17]  Andreas Spanias,et al.  Fast query by example of environmental sounds via robust and efficient cluster-based indexing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Alexander G. Gray,et al.  Learning dissimilarities by ranking: from SDP to QP , 2008, ICML '08.

[19]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[20]  R. M. Schafer,et al.  The Soundscape: Our Sonic Environment and the Tuning of the World , 1993 .

[21]  Samy Bengio,et al.  Large-scale content-based audio retrieval from text queries , 2008, MIR '08.

[22]  Gert R. G. Lanckriet,et al.  A Game-Based Approach for Collecting Semantic Annotations of Music , 2007, ISMIR.

[23]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[24]  Malcolm Slaney,et al.  Semantic-audio retrieval , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Silvana Castano,et al.  A Semantic Web ontology for context-based classification and retrieval of music resources , 2006, TOMCCAP.

[26]  Marc Leman,et al.  Content-Based Music Information Retrieval: Current Directions and Future Challenges , 2008, Proceedings of the IEEE.

[27]  Xavier Serra,et al.  Extending the folksonomies of freesound.org using content-based audio analysis , 2009 .

[28]  Andreas Spanias,et al.  Combining semantic, social, and acoustic similarity for retrieval of environmental sounds , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.