Searching for Audio by Sketching Mental Images of Sound: A Brave New Idea for Audio Retrieval in Creative Music Production

We propose a new paradigm for searching for sound by allowing users to graphically sketch their mental representation of sound as query. By conducting interviews with professional music producers and creators, we find that existing, text-based indexing and retrieval methods based on file names and tags to search for sound material in large collections (e.g., sample databases) do not reflect their mental concepts, which are often rooted in the visual domain and hence are far from their actual needs, work practices, and intuition. As a consequence, when creating new music on the basis of existing sounds, the process of finding these sounds is cumbersome and breaks their work flow by being forced to resort to browsing the collection. Prior work on organizing sound repositories aiming at bridging this conceptual gap between sound and vision builds upon psychological findings (often alluding to synaesthetic phenomena) or makes use of ad-hoc, technology-driven mappings. These methods foremost aim at visualizing the contents of collections or individual sounds and, again, facilitating browsing therein. For the purpose of indexing and querying, such methods have not been applied yet. We argue that the development of a search system that allows for visual queries to audio collections is desired by users and should inform and drive future research in audio retrieval. To explore this notion, we test the idea of a sketch interface with music producers in a semi-structured interview process by making use of a physical non-functional prototype. Based on the outcomes of this study, we propose a conceptual software prototype for visually querying sound repositories using image manipulation metaphors.

[1]  Davide Rocchesso,et al.  Sketch a Scratch , 2014 .

[2]  Andrea Faber Making Music With Sounds , 2016 .

[3]  Anne Treisman,et al.  Natural cross-modal mappings between visual and auditory features. , 2011, Journal of vision.

[4]  Marc Alexa,et al.  Sketch-Based Image Retrieval: Benchmark and Bag-of-Features Descriptors , 2011, IEEE Transactions on Visualization and Computer Graphics.

[5]  Marc Leman,et al.  Content-Based Music Information Retrieval: Current Directions and Future Challenges , 2008, Proceedings of the IEEE.

[6]  Graham Coleman Mused: Navigating the Personal sample Library , 2007, ICMC.

[7]  Arthur Flexer,et al.  Identification of perceptual qualities in textural sounds using the repertory grid method , 2011, AM '11.

[8]  Cindy Keefer,et al.  Oskar Fischinger 1900-1967: Experiments in Cinematic Abstraction , 2013 .

[9]  Diemo Schwarz,et al.  SOUND SEARCH BY CONTENT-BASED NAVIGATION IN LARGE DATABASES , 2009 .

[10]  Ya-Xi Chen,et al.  ThumbnailDJ: Visual Thumbnails of Music Content , 2010, ISMIR.

[11]  Kazuhiro Jo,et al.  Monalisa: "See the Sound , Hear the Image" , 2008, NIME.

[12]  Eoin Brazil,et al.  Sonic browsing: An auditory tool for multimedia asset management , 2001 .

[13]  Billy E. Brewster,et al.  Last Night a DJ Saved My Life: The History of the Disc Jockey , 1999 .

[14]  Kristina Andersen,et al.  GiantSteps: Semi-Structured Conversations with Musicians , 2015, CHI Extended Abstracts.

[15]  Davide Rocchesso,et al.  Sketching sound with voice and gesture , 2015, Interactions.

[16]  Diemo Schwarz,et al.  Current Research in concatenative sound synthesis , 2005, ICMC.

[17]  Arthur Flexer,et al.  Visualization of perceptual qualities in Textural sounds , 2012, ICMC.

[18]  Peter Knees,et al.  Sound/tracks: real-time synaesthetic sonification and visualisation of passing landscapes , 2008, ACM Multimedia.

[19]  Xavier Serra,et al.  Expressive Concatenative Synthesis by Reusing Samples from Real Performance Recordings , 2009, Computer Music Journal.

[20]  Timothy L. Hubbard,et al.  Musical Scales and Brightness Evaluations: Effects of Pitch, Direction, and Scale Mode , 2004 .

[21]  Malcolm Slaney,et al.  Mixtures of probability experts for audio retrieval and indexing , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[22]  Jörn Loviscach,et al.  Music Icons: Procedural Glyphs for Audio Files , 2006, 2006 19th Brazilian Symposium on Computer Graphics and Image Processing.

[23]  David Black,et al.  Track Displays in DAW Software: Beyond Waveform Views , 2010 .

[24]  F. Pachet,et al.  MUSICAL MOSAICING , 2001 .

[25]  Elias Pampalk,et al.  HIERARCHICAL ORGANIZATION AND VISUALIZATION OF DRUM SAMPLE LIBRARIES , 2004 .

[26]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[27]  Kostas Giannakis,et al.  A comparative evaluation of auditory-visual mappings for sound visualisation , 2006, Organised Sound.

[28]  Benjamin Bustos,et al.  Sketch-based image retrieval using keyshapes , 2013, Multimedia Tools and Applications.

[29]  Peter Knees,et al.  Two Data Sets for Tempo Estimation and Key Detection in Electronic Dance Music Annotated from User Corrections , 2015, ISMIR.

[30]  Gregory Kramer,et al.  Auditory Display: Sonification, Audification, And Auditory Interfaces , 1994 .

[31]  Tobias Schreck,et al.  STELA: sketch-based 3D model retrieval using a structure-based local approach , 2011, ICMR '11.

[32]  Ajay Kapur,et al.  Query-by-Beat-Boxing: Music Retrieval For The DJ , 2004, ISMIR.

[33]  Richard Kronland-Martinet,et al.  From shape to sound: sonification of two dimensional curves by reenaction of biological movements , 2012, CMMR 2012.

[34]  Anna Gavanas,et al.  DJ culture in the mix : power, technology, and social change in electronic dance music , 2013 .

[35]  Marc Alexa,et al.  A descriptor for large scale image retrieval based on sketched feature lines , 2009, SBIM '09.

[36]  W. Köhler Gestalt psychology , 1967 .

[37]  Shi-Min Hu,et al.  Sketch2Photo: internet image montage , 2009, ACM Trans. Graph..

[38]  Antoni B. Chan,et al.  Audio Information Retrieval using Semantic Similarity , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[39]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[40]  Peter Knees,et al.  The GiantSteps Project: A Second-Year Intermediate Report , 2016, ICMC.

[41]  Paris Smaragdis User guided audio selection from complex sound mixtures , 2009, UIST '09.

[42]  C.-C. Jay Kuo,et al.  Classification and retrieval of sound effects in audiovisual data management , 1999, Conference Record of the Thirty-Third Asilomar Conference on Signals, Systems, and Computers (Cat. No.CH37020).

[43]  D. Maurer,et al.  The shape of boubas: sound-shape correspondences in toddlers and adults. , 2006, Developmental science.

[44]  L E Marks,et al.  On associations of light and sound: the mediation of brightness, pitch, and loudness. , 1974, The American journal of psychology.

[45]  Rainer Typke,et al.  Music Retrieval based on Melodic Similarity , 2007 .

[46]  Adam Finkelstein,et al.  AudioQuilt: 2D Arrangements of Audio Samples using Metric Learning and Kernelized Sorting , 2014, NIME.

[47]  Remco C. Veltkamp,et al.  Shape matching: similarity measures and algorithms , 2001, Proceedings International Conference on Shape Modeling and Applications.

[48]  Bruno L. Giordano,et al.  Spatial representation of pitch height: the SMARC effect , 2006, Cognition.