Transcription of Polyphonic Vocal Music with a Repetitive Melodic Structure

This paper presents a novel method for transcription of folk music that exploits its specifics to improve transcription accuracy. In contrast to most commercial music, folk music recordings may contain various inaccuracies as they are usually performed by amateur musicians and recorded in the field. If we use standard approaches for transcription, these inaccuracies are reflected in erroneous pitch estimates. On the other hand, the structure of western folk music is usually simple as songs are often composed of repeated melodic parts. In our approach we make use of these repetitions to increase transcription robustness and improve its accuracy. The proposed method fuses three sources of information: (1) frame-based multiple F0 estimates, (2) song structure, and (3) pitch drift estimates. It first selects a representative segment of the analyzed song and aligns all the other segments to it considering temporal as well as frequency deviations. Information from all segments is summarized and used in a two-layer probabilistic model based on explicit duration HMMs, to segment frame-based information into notes. The method is evaluated with state-of-the-art transcription methods where we show that significant improvement in accuracy can be achieved.

[1]  Simon Dixon,et al.  A Deterministic Annealing EM Algorithm for Automatic Music Transcription , 2013, ISMIR.

[2]  Matija Marolt,et al.  A connectionist approach to automatic transcription of polyphonic piano music , 2004, IEEE Transactions on Multimedia.

[3]  George Tzanetakis,et al.  Computational ethnomusicology: a music information retrieval perspective , 2014, ICMC.

[4]  Xavier Serra,et al.  Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Anssi Klapuri,et al.  Automatic music transcription: challenges and future directions , 2013, Journal of Intelligent Information Systems.

[6]  Simon Dixon,et al.  Multiple-instrument polyphonic music transcription using a temporally constrained shift-invariant model. , 2013, The Journal of the Acoustical Society of America.

[7]  David Temperley,et al.  Pitch-Class Distribution and the Identification of Key , 2008 .

[8]  Masataka Goto A Predominant-F0 Estimation Method for Real-world Musical Audio Signals: MAP Estimation for Incorporating Prior Knowledge about F0s and Tone Models , 2001 .

[9]  Anssi Klapuri,et al.  Multiple fundamental frequency estimation based on harmonicity and spectral smoothness , 2003, IEEE Trans. Speech Audio Process..

[10]  Anssi Klapuri,et al.  Modelling of note events for singing transcription , 2004, SAPA@INTERSPEECH.

[11]  Dan Klein,et al.  Unsupervised Transcription of Piano Music , 2014, NIPS.

[12]  Björn Schuller,et al.  Automatic Transcription of Recorded Music , 2012 .

[13]  C. Krumhansl,et al.  Tracing the dynamic changes in perceived tonal organization in a spatial representation of musical keys. , 1982 .

[14]  Daniel P. W. Ellis,et al.  A Probabilistic Subspace Model for Multi-instrument Polyphonic Transcription , 2010, ISMIR.

[15]  J. Stephen Downie,et al.  The music information retrieval evaluation exchange (2005-2007): A window into music information retrieval research , 2008, Acoustical Science and Technology.

[16]  Roland Badeau,et al.  ON AUDIO , SPEECH , AND LANGUAGE PROCESSING 1 Harmonic Adaptive Latent Component Analysis of Audio and Application to Music Transcription , 2013 .

[17]  Tillman Weyde,et al.  An Efficient Temporally-Constrained Probabilistic Model for Multiple-Instrument Music Transcription , 2015, ISMIR.

[18]  Gregor Strle Conceptualizing the Ethnomuse : Application of CIDOC CRM and FRBR , 2007 .

[19]  Ciril Bohak,et al.  Probabilistic Segmentation of Folk Music Recordings , 2016 .

[20]  Tillman Weyde,et al.  Explicit Duration Hidden Markov Models for Multiple-Instrument Polyphonic Music Transcription , 2013, ISMIR.

[21]  Anssi Klapuri,et al.  Signal Processing Methods for Music Transcription , 2006 .

[22]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .

[23]  Perfecto Herrera,et al.  Computational Ethnomusicology: perspectives and challenges , 2013 .

[24]  Guillaume Lemaitre,et al.  Real-time Polyphonic Music Transcription with Non-negative Matrix Factorization and Beta-divergence , 2010, ISMIR.

[25]  Tillman Weyde,et al.  Automatic transcription of pitched and unpitched sounds from polyphonic music , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Arshia Cont Realtime Multiple Pitch Observation using Sparse Non-negative Constraints , 2006, ISMIR.

[27]  Shigeki Sagayama,et al.  Multipitch Analysis with Harmonic Nonnegative Matrix Approximation , 2007, ISMIR.

[28]  Matija Marolt,et al.  Automatic Transcription of Bell Chiming Recordings , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Bernhard Niedermayer Non-Negative Matrix Division for the Automatic Transcription of Polyphonic Music , 2008, ISMIR.

[30]  M.P. Ryynanen,et al.  Polyphonic music transcription using note event modeling , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[31]  Shigeki Sagayama,et al.  Rhythm and Tempo Analysis Toward Automatic Music Transcription , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.