Search Result Clustering in Collaborative Sound Collections

The large size of nowadays' online multimedia databases makes retrieving their content a difficult and time-consuming task. Users of online sound collections typically submit search queries that express a broad intent, often making the system return large and unmanageable result sets. Search Result Clustering is a technique that organises search-result content into coherent groups, which allows users to identify useful subsets in their results. Obtaining coherent and distinctive clusters that can be explored with a suitable interface is crucial for making this technique a useful complement of traditional search engines. In our work, we propose a graph-based approach using audio features for clustering diverse sound collections obtained when querying large online databases. We propose an approach to assess the performance of different features at scale, by taking advantage of the metadata associated with each sound. This analysis is complemented with an evaluation using ground-truth labels from manually annotated datasets. We show that using a confidence measure for discarding inconsistent clusters improves the quality of the partitions. After identifying the most appropriate features for clustering, we conduct an experiment with users performing a sound design task, in order to evaluate our approach and its user interface. A qualitative analysis is carried out including usability questionnaires and semi-structured interviews. This provides us with valuable new insights regarding the features that promote efficient interaction with the clusters.

[1]  Mike Thelwall,et al.  Synthesis Lectures on Information Concepts, Retrieval, and Services , 2009 .

[2]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[3]  Allison Druin,et al.  Technology probes: inspiring design for and with families , 2003, CHI '03.

[4]  Xavier Serra,et al.  Freesound technical demo , 2013, ACM Multimedia.

[5]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[6]  Martin Aumüller,et al.  ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms , 2018, SISAP.

[7]  Paul Taylor,et al.  Automatically clustering similar units for unit selection in speech synthesis , 1997, EUROSPEECH.

[8]  Gerard Roma Trepat Algorithms and representations for supporting online music creation with large-scale audio databases , 2015 .

[9]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[11]  Xavier Serra,et al.  Sound Sharing and Retrieval , 2018 .

[12]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[13]  Dawid Weiss,et al.  A survey of Web clustering engines , 2009, CSUR.

[14]  Perfecto Herrera-Boyer,et al.  Automatic Classification of Musical Instrument Sounds , 2003 .

[15]  Hui Xiong,et al.  Understanding of Internal Clustering Validation Measures , 2010, 2010 IEEE International Conference on Data Mining.

[16]  Jens Edlund,et al.  A Tool for Exploring Large Amounts of Found Audio Data , 2018, DHN.

[17]  Anssi Klapuri,et al.  Musical instrument recognition using cepstral coefficients and temporal features , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[18]  Lawrence Cayton,et al.  Fast nearest neighbor retrieval for bregman divergences , 2008, ICML '08.

[19]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[20]  Michael Cardew-Hall,et al.  The folksonomy tag cloud: when is it useful? , 2008, J. Inf. Sci..

[21]  Xavier Serra,et al.  End-to-end Learning for Music Audio Tagging at Scale , 2017, ISMIR.

[22]  Xavier Serra,et al.  Freesound Datasets: A Platform for the Creation of Open Audio Datasets , 2017, ISMIR.

[23]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[24]  Andreas Rauber,et al.  CONTENT-BASED ORGANIZATION OF DIGITAL AUDIO COLLECTIONS , 2005 .

[25]  Yingjie Tian,et al.  A Comprehensive Survey of Clustering Algorithms , 2015, Annals of Data Science.

[26]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[27]  Kanit Wongsuphasawat,et al.  Voyager: Exploratory Analysis via Faceted Browsing of Visualization Recommendations , 2016, IEEE Transactions on Visualization and Computer Graphics.

[28]  Mark Sanderson,et al.  Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press 2008. ISBN-13 978-0-521-86571-5, xxi + 482 pages , 2010, Natural Language Engineering.

[29]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[30]  Kai Li,et al.  Efficient k-nearest neighbor graph construction for generic similarity measures , 2011, WWW.

[31]  Xavier Serra,et al.  Essentia: An Audio Analysis Library for Music Information Retrieval , 2013, ISMIR.

[32]  Aren Jansen,et al.  Large-scale audio event discovery in one million YouTube videos , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Aren Jansen,et al.  Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Patrick Susini,et al.  The Timbre Toolbox: extracting audio descriptors from musical signals. , 2011, The Journal of the Acoustical Society of America.

[35]  A. Rauber,et al.  Bringing Mobile Map-Based Access to Digital Audio to the End User , 2007, 14th International Conference of Image Analysis and Processing - Workshops (ICIAPW 2007).

[36]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[37]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[38]  Daniel Tunkelang,et al.  Faceted Search , 2009, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[39]  James Bailey,et al.  Adjusting for Chance Clustering Comparison Measures , 2015, J. Mach. Learn. Res..

[40]  P. van Kranenburg,et al.  International Society for Music Information Retrieval , 2014 .

[41]  Mark Sandler,et al.  Transfer Learning for Music Classification and Regression Tasks , 2017, ISMIR.

[42]  Maria E. Niessen,et al.  Hierarchical modeling using automated sub-clustering for sound event recognition , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[43]  Yiannis Kompatsiaris,et al.  Graph-based multimodal clustering for social multimedia , 2017, Multimedia Tools and Applications.

[44]  Anna Xambó,et al.  Factors in human recognition of timbre lexicons generated by data clustering , 2012 .

[45]  Jeffrey J. Scott,et al.  MUSIC EMOTION RECOGNITION: A STATE OF THE ART REVIEW , 2010 .

[46]  George Tzanetakis,et al.  MARSYAS3D: A PROTOTYPE AUDIO BROWSER-EDITOR USING A LARGE SCALE IMMERSIVE VISUAL AND AUDIO DISPLAY , 2001 .

[47]  Mathieu Lagrange,et al.  Polyphonic Instrument Recognition Using Spectral Clustering , 2007, ISMIR.

[48]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).