Automatic transcription of pitched and unpitched sounds from polyphonic music

Automatic transcription of polyphonic music has been an active research field for several years and is considered by many to be a key enabling technology in music signal processing. However, current transcription approaches either focus on detecting pitched sounds (from pitched musical instruments) or on detecting unpitched sounds (from drum kits). In this paper, we propose a method that jointly transcribes pitched and unpitched sounds from polyphonic music recordings. The proposed model extends the probabilistic latent component analysis algorithm and supports the detection of pitched sounds from multiple instruments as well as the detection of un-pitched sounds from drum kit components, including bass drums, snare drums, cymbals, hi-hats, and toms. Our experiments based on polyphonic Western music containing both pitched and unpitched instruments led to very encouraging results in multi-pitch detection and drum transcription tasks.

[1]  Roland Badeau,et al.  ON AUDIO , SPEECH , AND LANGUAGE PROCESSING 1 Harmonic Adaptive Latent Component Analysis of Audio and Application to Music Transcription , 2013 .

[2]  Bhiksha Raj,et al.  Probabilistic Latent Variable Models as Nonnegative Factorizations , 2008, Comput. Intell. Neurosci..

[3]  Guillaume Lemaitre,et al.  Real-time Polyphonic Music Transcription with Non-negative Matrix Factorization and Beta-divergence , 2010, ISMIR.

[4]  Anssi Klapuri,et al.  Automatic music transcription: challenges and future directions , 2013, Journal of Intelligent Information Systems.

[5]  Daniel P. W. Ellis,et al.  A Probabilistic Subspace Model for Multi-instrument Polyphonic Transcription , 2010, ISMIR.

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  Roland Badeau,et al.  Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Peter Grosche,et al.  High resolution audio synchronization using chroma onset features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Henry Lindsay-Smith DRUMKIT TRANSCRIPTION VIA CONVOLUTIVE NMF , 2012 .

[10]  Joachim Fritsch,et al.  High Quality Musical Audio Source Separation , 2012 .

[11]  Anssi Klapuri,et al.  Drum Sound Detection in Polyphonic Music with Hidden Markov Models , 2009, EURASIP J. Audio Speech Music. Process..

[12]  Tillman Weyde,et al.  An efficient shift-invariant model for polyphonic music transcription , 2013 .

[13]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[14]  Gaël Richard,et al.  Transcription and Separation of Drum Signals From Polyphonic Music , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Christian Schörkhuber CONSTANT-Q TRANSFORM TOOLBOX FOR MUSIC PROCESSING , 2010 .

[16]  Simon Dixon,et al.  Multiple-instrument polyphonic music transcription using a temporally constrained shift-invariant model. , 2013, The Journal of the Acoustical Society of America.

[17]  Daniel P. W. Ellis,et al.  Transcribing Multi-Instrument Polyphonic Music With Hierarchical Eigeninstruments , 2011, IEEE Journal of Selected Topics in Signal Processing.

[18]  Anssi Klapuri,et al.  A Matlab Toolbox for Efficient Perfect Reconstruction Time-Frequency Transforms with Log-Frequency Resolution , 2014, Semantic Audio.