Melody Extraction and Musical Onset Detection from Framewise STFT Peak Data

We propose a probabilistic method for the joint segmentation and melody extraction for musical audio signals which arise from a monophonic score. The method operates on framewise short-time Fourier transform (STFT) peaks, enabling a computationally efficient inference of note onset, offset, and pitch attributes while retaining sufficient information for pitch determination and spectral change detection. The system explicitly models note events in terms of transient and steady-state regions as well as possible gaps between note events. In this way, the system readily distinguishes abrupt spectral changes associated with musical onsets from other abrupt change events. Additionally, the method may incorporate melodic context by modeling note-to-note dependences. The method is successfully applied to a variety of piano and violin recordings containing reverberation, effective polyphony due to legato playing style, expressive pitch variations, and background voices. While the method does not provide a sample-accurate segmentation, it facilitates the latter in subsequent processing by isolating musical onsets to frame neighborhoods and identifying possible pitch content before and after the true onset sample location.

[1]  Randal J. Leistikow,et al.  A New Probabilistic Spectral Pitch Estimator: Exact and MCMC-approximate Strategies , 2004, CMMR.

[2]  Bernd Edler Codierung von Audiosignalen mit überlappender Transformation und adaptiven Fensterfunktionen , 1989 .

[3]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4]  Julius O. Smith,et al.  Bayesian identification of closely-spaced chords frim single-frame STFT peaks , 2004 .

[5]  M. Slaney,et al.  PERCEPTUAL DISTANCE IN TIMBRE SPACE , 2005 .

[6]  Steve L Arson Musical Forces and Melodic Expectations: Comparing Computer Models and Experimental Results , 2004 .

[7]  W. Andrew Schloss,et al.  On the automatic transcription of percussive music , 1985 .

[8]  E. Terhardt Pitch, consonance, and harmony. , 1974, The Journal of the Acoustical Society of America.

[9]  N. I. Miridakis,et al.  Linear Estimation , 2018, Digital and Statistical Signal Processing.

[10]  Y. Bar-Shalom Tracking and data association , 1988 .

[11]  J. L. Goldstein An optimum processor theory for the central formation of the pitch of complex tones. , 1973, The Journal of the Acoustical Society of America.

[12]  Julius O. Smith,et al.  PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation , 1987, ICMC.

[13]  David Barber,et al.  Generative model based polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Mike E. Davies,et al.  Improved Time-Scaling of Musical Audio Using Phase Locking at Transients , 2002 .

[16]  Kunio Kashino,et al.  Application of the Bayesian probability network to music scene analysis , 1998 .

[17]  Julius O. Smith,et al.  Detection and modeling of transient audio signals with prior information , 2005 .

[18]  Julius O. Smith,et al.  A Sines+Transients+Noise Audio Representation for Data Compression and Time/Pitch Scale Modifications , 1998 .

[19]  Harvey Thornburg,et al.  ANALYSIS AND RESYNTHESIS OF QUASI-HARMONIC SOUNDS: AN ITERATIVE FILTERBANK APPROACH , 2003 .

[20]  Christopher Raphael,et al.  Automatic Segmentation of Acoustic Musical Signals Using Hidden Markov Models , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Paul Fearnhead,et al.  Exact and efficient Bayesian inference for multiple changepoint problems , 2006, Stat. Comput..

[22]  Daniel P. W. Ellis,et al.  Chord segmentation and recognition using EM-trained hidden markov models , 2003, ISMIR.

[23]  Kunio Kashino,et al.  Bayesian estimation of simultaneous musical notes based on frequency domain modelling , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Simon J. Godsill,et al.  Polyphonic pitch tracking using joint Bayesian estimation of multiple frame parameters , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[25]  Stephen W. Hainsworth,et al.  Techniques for the Automated Analysis of Musical Audio , 2004 .

[26]  Ali Taylan Cemgil,et al.  Bayesian Music Transcription , 1997 .

[27]  Vladimir Pavlovic,et al.  A Dynamic Bayesian Network Approach to Tracking Using Learned Switching Dynamic Models , 2000, HSCC.

[28]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..