DESIGN AND EVALUATION OF A VISUALIZATION INTERFACE FOR QUERYING LARGE UNSTRUCTURED SOUND DATABASES

Search is an underestimated problem that plays a big role in any application dealing with large databases. The more extensive and heterogeneous our data is, the harder is to find exactly what we are looking for. This idea resembles the data availability paradox stated by Woods [45]: "more and more data is available, but our ability to interpret what is available has not increased". Then the question arises: is it really useful to collect a big dataset even if we do not have the ability to successfully navigate among it? According to Morville and Callender [27], search is a grand challenge that can be succeeded with courage and vision. A good searching tool completely improves the exploitation we can do of our information resources. As a consequence, commonly used search methods must evolve. Search goal is more than finding, search should become a conversation process where answers change the questions. Having stated all that, it seems clear that extensive effort should be invested on the research and design of appropriate tools for finding our needles in the haystack. However, search is a problem that does not have a general solution. It must be adapted to the context of the information we are dealing with, in the case of the presnet document, unstructured sound databases. The aim of this thesis is the design of a visualization interface that let users graphically define queries for the Freesound Project database (http://www.freesound.org) and retrieve suitable results for a musical context. Music Information Retrieval (MIR) techniques are used to analyze all the files in the database and automatically extract audio features concerning four different aspects of sound perception: temporal envelope, timbre, tonal information and pitch. Users perform queries by graphically specifying a target for each one of these perceptual aspects, that is to say, queries are specified by defining the physical properties of the sound itself rather than indicating its source (as is usually done in common text-based search engines). Similarity search is performed among the whole database to find the most similar sound files, and returned results are represented as points in a two-dimensional space that users can explore.

[1]  N. Scaringella,et al.  Automatic genre classification of music content: a survey , 2006, IEEE Signal Process. Mag..

[2]  Graham E. Poliner,et al.  Melody Transcription From Music Audio: Approaches and Evaluation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Paul M. Brossier,et al.  Automatic annotation of musical audio for interactive applications , 2006 .

[4]  Perfecto Herrera,et al.  Morphological Sound Description: Computational Model and Usability Evaluation , 2004 .

[5]  Marcia J. Bates,et al.  Where should the person stop and the information search interface start? , 1990, Inf. Process. Manag..

[6]  George Tzanetakis,et al.  Visualization in Audio-Based Music Information Retrieval , 2006, Computer Music Journal.

[7]  Emilia Gómez,et al.  Tonal Description of Polyphonic Audio for Music Content Processing , 2006, INFORMS J. Comput..

[8]  Elias Pampalk,et al.  Content-based organization and visualization of music archives , 2002, MULTIMEDIA '02.

[9]  Michael Evans,et al.  Music Information Retrieval in Broadcasting: Some Visual Applications , 2007 .

[10]  Emilie M. Roth,et al.  Can We Ever Escape from Data Overload? A Cognitive Systems Diagnosis , 1999 .

[11]  Xavier Serra,et al.  Freesound Radio: supporting music creation by exploration of a sound database , 2009 .

[12]  Ben Shneiderman,et al.  Dynamic queries for visual information seeking , 1994, IEEE Software.

[13]  Karrie Karahalios,et al.  Seeing More: Visualizing Audio Cues , 2007, INTERACT.

[14]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[15]  Jörn Loviscach,et al.  SonoSketch: Querying Sound Effect Databases through Painting , 2009 .

[16]  Diemo Schwarz THE CATERPILLAR SYSTEM FOR DATA-DRIVEN CONCATENATIVE SOUND SYNTHESIS , 2003 .

[17]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[18]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[19]  E. Brazil,et al.  Audio information browsing with the Sonic Browser , 2003, Proceedings International Conference on Coordinated and Multiple Views in Exploratory Visualization - CMV 2003 -.

[20]  David Temperley,et al.  What's Key for Key? The Krumhansl-Schmuckler Key-Finding Algorithm Reconsidered , 1999 .

[21]  Daniel A. Keim,et al.  Information Visualization and Visual Data Mining , 2002, IEEE Trans. Vis. Comput. Graph..

[22]  Ganesh S. Oak Information Visualization Introduction , 2022 .

[23]  Jonathan Berger,et al.  APPLICATION OF RASTER SCANNING METHOD TO IMAGE SONIFICATION, SOUND VISUALIZATION, SOUND ANALYSIS AND SYNTHESIS , 2006 .

[24]  Jörn Loviscach,et al.  SoundTorch: Quick Browsing in Large Audio Collections , 2008 .

[25]  Pierre Schaeffer Traité des objets musicaux , 1966 .

[26]  Wayne Slawson The Color of Sound: A Theoretical Study in Musical Timbre , 1981 .

[27]  William W. Gaver What in the World Do We Hear? An Ecological Approach to Auditory Event Perception , 1993 .

[28]  Masataka Goto,et al.  A chorus-section detecting method for musical audio signals , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[29]  J. Grey Multidimensional perceptual scaling of musical timbres. , 1977, The Journal of the Acoustical Society of America.

[30]  Markus Koppenberger,et al.  Knowledge and Content-Based Audio Retrieval Using Wordnet , 2004, ICETE.

[31]  Gary Marchionini,et al.  Relation Browser++: an information exploration and searching tool , 2004, DG.O.

[32]  George Tzanetakis,et al.  ENHANCING SONIC BROWSING USING AUDIO INFORMATION RETRIEVAL , 2002 .

[33]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[34]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[35]  Marc Leman,et al.  Content-Based Music Information Retrieval: Current Directions and Future Challenges , 2008, Proceedings of the IEEE.

[36]  Òscar Celma,et al.  Sound Effect Taxonomy Management in Production Environments , 2004 .

[37]  Robin Jeffries,et al.  User interface evaluation in the real world: a comparison of four techniques , 1991, CHI.

[38]  Perfecto Herrera,et al.  Automatic Detection of Emotion in Music: Interaction with Emotionally Sensitive Machines , 2018 .

[39]  F. Pachet,et al.  MUSICAL MOSAICING , 2001 .

[40]  Gary M. Olson,et al.  The growth of cognitive modeling in human-computer interaction since GOMS , 1990 .

[41]  Steve Jones Graphical query specification and dynamic result previews for a digital library , 1998, UIST '98.

[42]  Ben Shneiderman,et al.  Facilitating data exploration with query previews: A study of user performance and preference , 2000, Behav. Inf. Technol..

[43]  Ben Shneiderman,et al.  Designing The User Interface , 2013 .

[44]  Allen Newell,et al.  The psychology of human-computer interaction , 1983 .

[45]  Simon Dixon,et al.  A Review of Automatic Rhythm Description Systems , 2005, Computer Music Journal.