Automatic annotation of musical audio for interactive applications

As machines become more and more portable, and part of our everyday life, it becomes apparent that developing interactive and ubiquitous systems is an important aspect of new music applications created by the research community. We are interested in developing a robust layer for the automatic annotation of audio signals, to be used in various applications, from music search engines to interactive installations, and in various contexts, from embedded devices to audio content servers. We propose adaptations of existing signal processing techniques to a real time context. Amongst these annotation techniques, we concentrate on low and mid-level tasks such as onset detection, pitch tracking, tempo extraction and note modelling. We present a framework to extract these annotations and evaluate the performances of different algorithms. The first task is to detect onsets and offsets in audio streams within short latencies. The segmentation of audio streams into temporal objects enables various manipulation and analysis of metrical structure. Evaluation of different algorithms and their adaptation to real time are described. We then tackle the problem of fundamental frequency estimation, again trying to reduce both the delay and the computational cost. Different algorithms are implemented for real time and experimented on monophonic recordings and complex signals. Spectral analysis can be used to label the temporal segments; the estimation of higher level descriptions is approached. Techniques for modelling of note objects and localisation of beats are implemented and discussed. Applications of our framework include live and interactive music installations, and more generally tools for the composers and sound engineers. Speed optimisations may bring a significant improvement to various automated tasks, such as automatic classification and recommendation systems. We describe the design of our software solution, for our research purposes and in view of its integration within other systems.

[1]  Matthew E. P. Davies,et al.  Causal Tempo Tracking of Audio , 2004, ISMIR.

[2]  Marina Bosi,et al.  Overview of MPEG audio : Current and future standards for low-bit-rate audio coding , 1997 .

[3]  Juan G. Roederer,et al.  Introduction to the physics and psychophysics of music , 1973 .

[4]  Karlheinz Brandenburg,et al.  MP3 and AAC Explained , 1999 .

[5]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[6]  W. Yost Pitch strength of iterated rippled noise. , 1996, The Journal of the Acoustical Society of America.

[7]  Michael Good MusicXML: An internet-friendly format for sheet music , 2001 .

[8]  Juan Pablo Bello,et al.  Towards the automated analysis of simple polyphonic music: A knowledge-based approach (Ph.D. Thesis) , 2003 .

[9]  Pedro Cano,et al.  Fundamental Frequency Estimation in the SMS analysis , 1998 .

[10]  W. A. Mvnso,et al.  Loudness , Its Definition , Measurement and Calculation , 2004 .

[11]  J. Grey,et al.  Perceptual evaluations of synthesized musical instrument tones , 1977 .

[12]  Philippe Lepain Polyphonic Pitch Extraction from Musical Signals , 1999 .

[13]  Ajay Kapur,et al.  Retrieval of percussion gestures using timbre classification techniques , 2004, ISMIR.

[14]  Mark Dolson,et al.  The Phase Vocoder: A Tutorial , 1986 .

[15]  Simon Dixon,et al.  Dance music classification: A tempo-based approach , 2004, ISMIR.

[16]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[17]  Barry Vercoe,et al.  The Synthetic Performer in The Context of Live Performance , 1984, International Conference on Mathematics and Computing.

[18]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[19]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[20]  Juan Pablo Bello,et al.  A Robust Mid-Level Representation for Harmonic Content in Music Signals , 2005, ISMIR.

[21]  Carlos Caires IRIN: Micromontage in Graphical Sound Editing and Mixing Tool , 2004, ICMC.

[22]  Anders Friberg,et al.  Swing Ratios and Ensemble Timing in Jazz Performance: Evidence for a Common Rhythmic Pattern , 2002 .

[23]  Wael Hassan Simplified Wrapper and Interface Generator , 2000 .

[24]  Miller Puckette,et al.  Pure Data , 1997, ICMC.

[25]  Matti Karjalainen,et al.  Towards High-Quality Sound Synthesis of the Guitar and String Instruments , 1993, ICMC.

[26]  Anssi Klapuri,et al.  Sound onset detection by applying psychoacoustic knowledge , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[27]  Mark B. Sandler,et al.  Phase-based note onset detection for music signals , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[28]  B H Repp,et al.  Some empirical observations on sound level properties of recorded piano tones. , 1993, The Journal of the Acoustical Society of America.

[29]  Perfecto Herrera,et al.  Transmitting Audio Content as Sound Objects , 2002 .

[30]  J. Beauchamp,et al.  Fundamental frequency estimation of musical signals using a two‐way mismatch procedure , 1994 .

[31]  M. Mathews,et al.  Analysis of musical‐instrument tones , 1969 .

[32]  Aaron E. Rosenberg,et al.  A comparative performance study of several pitch detection algorithms , 1976 .

[33]  Xavier Rodet,et al.  Instrument identification in solo and ensemble music using Independent Subspace Analysis , 2004, ISMIR.

[34]  S Puckette Miller,et al.  Pure Data : another integrated computer music environment , 1996 .

[35]  Heiko Purnhagen,et al.  HILN-the MPEG-4 parametric audio coding tools , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[36]  Matthew E. P. Davies,et al.  BEAT TRACKING WITH A TWO STATE MODEL , 2005 .

[37]  M. Portnoff,et al.  Implementation of the digital phase vocoder using the fast Fourier transform , 1976 .

[38]  Anssi Klapuri,et al.  Multiple fundamental frequency estimation based on harmonicity and spectral smoothness , 2003, IEEE Trans. Speech Audio Process..

[39]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[40]  Dirk Moelants,et al.  A computer system for the automatic detection of perceptual onsets in a musical signal , 1997 .

[41]  George Tzanetakis,et al.  An experimental comparison of audio tempo induction algorithms , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  Yoichi Muraoka,et al.  Musical understanding at the beat level: real-time beat tracking for audio signals , 1998 .

[43]  Simon Dixon An Interactive Beat Tracking and Visualisation System , 2001, ICMC.

[44]  Tristan Jehan EVENT-SYNCHRONOUS MUSIC ANALYSIS / SYNTHESIS , 2004 .

[45]  Mark Sandler,et al.  Pitch Locking Monophonic Music Analysis , 2002 .

[46]  Anssi Klapuri,et al.  AUTOMATIC TRANSCRIPTION OF MUSIC , 2003 .

[47]  T. Parks,et al.  Maximum likelihood pitch estimation , 1976, 1977 IEEE Conference on Decision and Control including the 16th Symposium on Adaptive Processes and A Special Symposium on Fuzzy Set Theory and Applications.

[48]  Gaël Richard,et al.  Methodology and Tools for the evaluation of automatic onset detection algorithms in music , 2004, ISMIR.

[49]  S. Schwerman,et al.  The Physics of Musical Instruments , 1991 .

[50]  F. Richard Moore,et al.  The Dysfunctions of MIDI , 1988, ICMC.

[51]  Lawrence R. Rabiner,et al.  Applications of a nonlinear smoothing algorithm to speech processing , 1975 .

[52]  Mike E. Davies,et al.  SEPARATION OF TRANSIENT INFORMATION IN MUSICAL AUDIO USING MULTIRESOLUTION ANALYSIS TECHNIQUES , 2001 .

[53]  Xavier Rodet,et al.  Synthesis and Control of Hundreds of Sinusoidal Partials on a Desktop Computer without Custom Hardware , 1993, ICMC.

[54]  Roger B. Dannenberg,et al.  An On-Line Algorithm for Real-Time Accompaniment , 1984, ICMC.

[55]  Malcolm D. Macleod,et al.  Onset Detection in Musical Audio Signals , 2003, ICMC.

[56]  Julius O. Smith,et al.  Spectral modeling synthesis: A sound analysis/synthesis based on a deterministic plus stochastic decomposition , 1990 .

[57]  Mark Sandler,et al.  Fast implementation for non-linear time-scaling of stereo signals , 2005 .

[58]  Anssi Klapuri,et al.  Signal Processing Methods for the Automatic Transcription of Music , 2004 .

[59]  Johan Sundberg,et al.  Musician’s and computer’s tone inter-onset-interval in Mozart’s Piano Sonata K 332, 2nd mvt, bar 1-20 , 2003 .

[60]  Anssi Klapuri,et al.  Qualitative and quantitative aspects in the design of periodicity estimation algorithms , 2000, 2000 10th European Signal Processing Conference.

[61]  Curtis Roads,et al.  The Computer Music Tutorial , 1996 .

[62]  Xavier Rodet,et al.  Fundamental frequency estimation and tracking using maximum likelihood harmonic matching and HMMs , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[63]  Barry Vercoe,et al.  Music-listening systems , 2000 .

[64]  John Strawn,et al.  Lexicon of Analyzed Tones. Part 3: The Trumpet , 1978 .

[65]  F. Pachet,et al.  MUSICAL MOSAICING , 2001 .

[66]  Daniel Patrick Whittlesey Ellis,et al.  Prediction-driven computational auditory scene analysis , 1996 .

[67]  I. Kauppinen,et al.  Methods for detecting impulsive noise in speech and audio signals , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[68]  Christopher Raphael,et al.  Music Plus One: A System for Expressive and Flexible Musical Accompaniment , 2001, ICMC.

[69]  Richard F. Lyon,et al.  Experiments with a computational model of the cochlea , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[70]  Judith C. Brown Determination of the meter of musical scores by autocorrelation , 1993 .

[71]  W. Andrew Schloss,et al.  Toward an Intelligent Editor of Digital Audio: Signal Processing Methods , 1982 .

[72]  George Tzanetakis,et al.  HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH , 2002 .

[73]  Miller Puckette,et al.  Real-time audio analysis tools for Pd and MSP , 1998, ICMC.

[74]  Nicola Orio,et al.  Score Following Using Spectral Analysis and Hidden Markov Models , 2001, ICMC.

[75]  M. Davies,et al.  Complex domain onset detection for musical signals , 2003 .

[76]  Christopher Fry MidiVox Voice-to-MIDI Converter , 1992 .

[77]  Masataka Goto,et al.  Development of the RWC Music Database , 2004 .

[78]  Stephen W. Hainsworth,et al.  Techniques for the Automated Analysis of Musical Audio , 2004 .

[79]  B. S. Manjunath,et al.  Introduction to MPEG-7: Multimedia Content Description Interface , 2002 .

[80]  Stephen McAdams,et al.  Music: A science of the mind? , 1987 .

[81]  Tim Pohle,et al.  Dynamic Playlist Generation Based on Skipping Behavior , 2005, ISMIR.

[82]  Eduardo Miranda,et al.  Computer Sound Design : Synthesis techniques and programming , 2002 .

[83]  Chris Chafe,et al.  Toward an Intelligent Editor of Digital Audio: Recognition of Musical Constructs , 1982 .

[84]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals: Algorithms and Devices , 1983 .

[85]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[86]  Mark D. Plumbley,et al.  Real Time Object Based Coding , 2003 .

[87]  Mark B. Sandler,et al.  On the use of phase and energy for musical onset detection in the complex domain , 2004, IEEE Signal Processing Letters.

[88]  Xavier Serra,et al.  A sound analysis/synthesis system based on a deterministic plus stochastic decomposition , 1990 .

[89]  Eric D. Scheirer,et al.  Tempo and beat analysis of acoustic musical signals. , 1998, The Journal of the Acoustical Society of America.

[90]  J. C. Risset,et al.  Computer Study of Trumpet Tones , 1965 .

[91]  John William Gordon Perception of attack transients in musical tones , 1984 .

[92]  Shingo Uchihashi,et al.  The beat spectrum: a new approach to rhythm analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[93]  W. Andrew Schloss,et al.  On the automatic transcription of percussive music , 1985 .

[94]  Mark B. Sandler,et al.  A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[95]  Julius O. Smith,et al.  Efficient Synthesis of Stringed Musical Instruments , 1993, ICMC.

[96]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[97]  Simon Dixon,et al.  Automatic Extraction of Tempo and Beat From Expressive Performances , 2001 .

[98]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[99]  Masataka Goto,et al.  An Audio-based Real-time Beat Tracking System for Music With or Without Drum-sounds , 2001 .

[100]  Roger B. Dannenberg,et al.  Tracking Musical Beats in Real Time , 1990, ICMC.

[101]  Mark D. Plumbley,et al.  Unsupervised onset detection : A probabilistic approach using ICA and a hidden Markov classifier , 2003 .

[102]  Yoichi Muraoka,et al.  Rhythm Tracking Using Multiple Hypotheses , 1994, ICMC.

[103]  R. Ritsma Existence Region of the Tonal Residue. I , 1962 .

[104]  C. van Vreeswijk,et al.  What Is the Neural Code , 2006 .

[105]  Emmanuel Vincent,et al.  Predominant-F0 estimation using Bayesian harmonic waveform models , 2005 .

[106]  Michael A. Casey Understanding Musical Sound with Forward Models and Physical Models , 1994, Connect. Sci..

[107]  James A. Moorer,et al.  The Use of the Phase Vocoder in Computer Music Applications , 1976 .

[108]  Mark D. Plumbley,et al.  Real-time temporal segmentation of note objects in music signals , 2004, ICMC.

[109]  Christopher Raphael,et al.  Automated Rhythm Transcription , 2001, ISMIR.

[110]  Anssi Klapuri,et al.  Modelling of note events for singing transcription , 2004, SAPA@INTERSPEECH.

[111]  Roger B. Dannenberg,et al.  Following an Improvisation in Real Time , 1987, ICMC.

[112]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[113]  I. C. Whitfield,et al.  Chapter 5 – THE NEURAL CODE , 1978 .

[114]  Ichiro Fujinaga,et al.  Audio Latency Measurements of Desktop Operating Systems , 2001, ICMC.

[115]  Anssi Klapuri,et al.  Pitch estimation using multiple independent time-frequency windows , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[116]  Matthew Wright,et al.  Open SoundControl: A New Protocol for Communicating with Sound Synthesizers , 1997, ICMC.

[117]  Anssi Klapuri,et al.  Multipitch estimation and sound separation by the spectral smoothness principle , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[118]  Justin London,et al.  Cognitive Constraints on Metric Systems: Some Observations and Hypotheses , 2002 .

[119]  Udo Zoelzer,et al.  DAFX: Digital Audio Effects , 2011 .

[120]  Pierre Schaeffer Traité des objets musicaux , 1966 .

[121]  George Tzanetakis,et al.  MARSYAS: a framework for audio analysis , 1999, Organised Sound.

[122]  Emilia Gómez,et al.  CONTENT-BASED MELODIC TRANSFORMATIONS OF AUDIO MATERIAL FOR A MUSIC PROCESSING APPLICATION , 2003 .

[123]  Amalia de Götzen,et al.  TRADITIONAL (?) IMPLEMENTATIONS OF A PHASE-VOCODER: THE TRICKS OF THE TRADE , 2000 .

[124]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[125]  Robert O. Gjerdingen,et al.  The Psychology of Music , 1972 .

[126]  Avi Pfeffer,et al.  A Hierarchical Approach to Onset Detection , 2004, ICMC.

[127]  M.P. Ryynanen,et al.  Polyphonic music transcription using note event modeling , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[128]  Yoichi Muraoka,et al.  Issues in Evaluating Beat Tracking Systems , 2005 .

[129]  Amílcar Cardoso,et al.  A methodology for detection of melody in polyphonic music signals , 2004 .

[130]  Mark D. Plumbley,et al.  Fast labelling of notes in music signals , 2004, ISMIR.

[131]  Eric D. Scheirer,et al.  SAOL: The MPEG-4 Structured Audio Orchestra Language , 1999, Computer Music Journal.

[132]  Kunio Kashino,et al.  A Sound Source Separation System with the Ability of Automatic Tone Modeling , 1993, International Conference on Mathematics and Computing.

[133]  G. A. Miller,et al.  The Perception of Repeated Bursts of Noise , 1948 .

[134]  Julius O. Smith,et al.  Techniques for Note Identification in Polyphonic Music , 1985, ICMC.

[135]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[136]  Xavier Serra,et al.  A system for sound analysis/transformation/synthesis based on a deterministic plus stochastic decomposition , 1989 .

[137]  John Strawn,et al.  Lexicon of Analyzed Tones. Part 2: Clarinet and Oboe Tones , 1977 .

[138]  Matthew E. P. Davies,et al.  Beat Tracking Towards Automatic Musical Accompaniment , 2005 .

[139]  Vincent Verfaille Effets audionumériques adaptatifs : théorie, mise en œuvre et usage en création musicale numérique , 2003 .

[140]  Yoichi Muraoka,et al.  A Real-Time Beat Tracking System for Audio Signals , 1996, ICMC.

[141]  D. Gareth Loy,et al.  Musicians make a standard: the MIDI phenomenon , 1985 .

[142]  Nick Collins ON ONSETS ONTHE-FLY : REAL-TIME EVENT SEGMENTATION AND CATEGORISATION AS A COMPOSITIONAL EFFECT , 2004 .

[143]  Michael A. Casey,et al.  Acoustic lexemes for organizing internet audio , 2005 .

[144]  Anssi Klapuri,et al.  Automatic transcription of musical recordings , 2001 .

[145]  Eric D. Scheirer The MPEG-4 Structured Audio standard , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[146]  Christophe Andrieu,et al.  Bayesian curve fitting using MCMC with applications to signal segmentation , 2002, IEEE Trans. Signal Process..

[147]  J. Rubio An object-oriented metamodel for digital signal processing with a focus on audio and music , 2005 .

[148]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[149]  Gerhard Widmer,et al.  Towards Characterisation of Music via Rhythmic Patterns , 2004, ISMIR.

[150]  Judith C. Brown,et al.  Musical frequency tracking using the methods of conventional and , 1991 .

[151]  R. Patterson,et al.  The lower limit of melodic pitch. , 2001, The Journal of the Acoustical Society of America.

[152]  François Pachet Multimedia at work - Playing with virtual musicians: the continuator in practice , 2002, IEEE MultiMedia.

[153]  Juan Pablo Bello,et al.  Time-domain polyphonic transcription using self-generating databases , 2002 .

[154]  François Pachet,et al.  From Sound Sampling To Song Sampling , 2004, ISMIR.

[155]  Anssi Klapuri,et al.  Melody Description and Extraction in the Context of Music Content Processing , 2003 .

[156]  Leslie S. Smith Using an Onset-based Representation for Sound Segmentation , 1995 .

[157]  Thomas Baer,et al.  A model for the prediction of thresholds, loudness, and partial loudness , 1997 .

[158]  Richard F. Lyon,et al.  A perceptual pitch detector , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[159]  Eric S. Raymond,et al.  The Art of Unix Programming , 2003 .

[160]  I. Nelken Demonstrations of Auditory Scene Analysis: The Perceptual Organization of Sound by Albert S. Bregman and Pierre A. Ahad, MIT Press, 1996. £15.95 CD , 1997, Trends in Neurosciences.

[161]  Simon Dixon,et al.  An open source tool for semi-automatic rhythmic annotation , 2004 .

[162]  Perry R. Cook,et al.  ChucK: A Concurrent, On-the-fly, Audio Programming Language , 2003, ICMC.

[163]  François Pachet,et al.  Ringomatic: A Real-Time Interactive Drummer Using Constraint-Satisfaction and Drum Sound Descriptors , 2005, ISMIR.

[164]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[165]  Christopher Dobrian STRATEGIES FOR CONTINUOUS PITCH AND AMPLITUDE TRACKING IN REALTIME INTERACTIVE IMPROVISATION SOFTWARE , 2004 .