Melody Extraction from Polyphonic Music Signals Melody Extraction from Polyphonic Music Signals

M elody extraction algorithms aim to produce a sequence of frequency values corresponding to the pitch of the dominant melody from a musical recording. Over the past decade, melody extraction has emerged as an active research topic, comprising a large variety of proposed algorithms spanning a wide range of techniques. This article provides an overview of these techniques, the applications for which melody extraction is useful, and the challenges that remain. We start with a discussion of " melody " from both musical and signal processing perspectives and provide a case study that interprets the output of a melody extraction algorithm for specific excerpts. We then provide a comprehensive comparative analysis of melody extraction algorithms based on the results of an international evaluation campaign. We discuss issues of algorithm design, evaluation, and applications that build upon melody extraction. Finally, we discuss some of the remaining challenges in melody extraction research in terms of algorithmic performance, development, and evaluation methodology. IntroductIon Music was the first mass-market industry to be completely restructured by digital technology starting with the compact disc and leading to today's situation where typical consumers may have access to thousands of tracks stored locally on their smart-phone or music player, and millions of tracks instantly available through cloud-based music services. This vast quantity of music demands novel methods of description, indexing, searching, and interaction. Recent advances in audio processing have led to technologies that can help users interact with music by directly analyzing the musical content of audio files. The extraction of melody from polyphonic music signals is such a technology and has received substantial attention from the audio signal processing and music information retrieval (MIR) research communities. Known as melody extraction, audio melody extraction, predominant melody extraction, predominant melody estimation, or predominant fundamental frequency estimation, the task involves automatically obtaining a sequence of frequency values representing the pitch of the dominant melodic line from recorded music audio signals (Figure 1). Music transcription, i.e., converting an audio signal into a description of all the notes being played, is a task that can usually be achieved by a trained student of music and has long been a topic of computational research. It has, however, proven to be very difficult due to the complex and deliberately overlapped

[1]  Graham E. Poliner,et al.  Melody Transcription From Music Audio: Approaches and Evaluation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Karin Dressler,et al.  An Auditory Streaming Approach for Melody Extraction from Polyphonic Music , 2011, ISMIR.

[3]  Rémi Gribonval,et al.  Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  M. Marolt ON FINDING MELODIC LINES IN AUDIO RECORDINGS , 2004 .

[5]  Pablo Cancela Tracking melody in polyphonic audio. MIREX 2008 , 2008 .

[6]  Antoine Liutkus,et al.  Adaptive filtering for music/voice separation exploiting the repeating musical structure , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  DeLiang Wang,et al.  Separation of singing voice from music accompaniment for monaural recordings , 2007 .

[8]  Karin Dressler,et al.  SINUSOIDAL EXTRACTION USING AN EFFICIENT IMPLEMENTATION OF A MULTI-RESOLUTION FFT , 2006 .

[9]  Hirokazu Kameoka,et al.  Harmonic and Percussive Sound Separation and Its Application to MIR-Related Tasks , 2010, Advances in Music Information Retrieval.

[10]  Paris Smaragdis,et al.  Singing-voice separation from monaural recordings using robust principal component analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  J. Stephen Downie,et al.  The music information retrieval evaluation exchange (2005-2007): A window into music information retrieval research , 2008, Acoustical Science and Technology.

[12]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals: Algorithms and Devices , 1983 .

[13]  Shigeki Sagayama,et al.  Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Daniel P. W. Ellis,et al.  A Classification Approach to Melody Transcription , 2005, ISMIR.

[15]  Anssi Klapuri,et al.  Signal Processing Methods for the Automatic Transcription of Music , 2004 .

[16]  Anssi Klapuri,et al.  Automatic Transcription of Melody, Bass Line, and Chords in Polyphonic Music , 2008, Computer Music Journal.

[17]  Matija Marolt,et al.  A Mid-Level Representation for Melody-Based Retrieval in Audio Collections , 2008, IEEE Transactions on Multimedia.

[18]  Julián Urbano,et al.  Current Challenges in the Evaluation of Predominant Melody Extraction Algorithms , 2012, ISMIR.

[19]  José Miguel Díaz-Báñez,et al.  Tracking Melodic Patterns in Flamenco Singing by Analyzing Polyphonic Music Recordings , 2012, ISMIR.

[20]  Anssi Klapuri,et al.  Melody Description and Extraction in the Context of Music Content Processing , 2003 .

[21]  Martín Rocamora,et al.  FAN CHIRP TRANSFORM FOR MUSIC REPRESENTATION , 2010 .

[22]  Seokhwan Jo,et al.  Melody pitch estimation based on range estimation and candidate extraction using harmonic structure model , 2010, INTERSPEECH.

[23]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[24]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[25]  Rainer Typke,et al.  Music Retrieval based on Melodic Similarity , 2007 .

[26]  Kyoungro Yoon,et al.  Mid-Level Music Melody Representation of Polyphonic Audio for Query-by-Humming System , 2002, ISMIR.

[27]  Anssi Klapuri,et al.  Query by humming of midi and audio using locality sensitive hashing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Karin Dressler,et al.  Pitch Estimation by the Pair-Wise Evaluation of Spectral Peaks , 2011, Semantic Audio.

[29]  Xavier Serra,et al.  Characterization of Intonation in Carnatic Music by Parametrizing Pitch Histograms , 2012, ISMIR.

[30]  Preeti Rao,et al.  Vocal Melody Extraction in the Presence of Pitched Accompaniment in Polyphonic Music , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Anssi Klapuri,et al.  Singer Identification in Polyphonic Music Using Vocal Separation and Pattern Recognition Methods , 2007, ISMIR.

[32]  Mathieu Lagrange,et al.  Multimodal similarity between musical streams for cover version detection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Gaël Richard,et al.  Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Vipul Arora,et al.  On-Line Melody Extraction From Polyphonic Audio Using Harmonic Cluster Tracking , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Emilia Gómez,et al.  Musical genre classification using melody features extracted from polyphonic music signals , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  B. Moore An introduction to the psychology of hearing, 3rd ed. , 1989 .

[37]  Emilia Gómez,et al.  Tonal representations for music retrieval: from version identification to query-by-humming , 2012, International Journal of Multimedia Information Retrieval.

[38]  Jyh-Shing Roger Jang,et al.  A hybrid approach to singing pitch extraction based on trend estimation and hidden Markov models , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  Hsin-Min Wang,et al.  Using the Similarity of Main Melodies to Identify Cover Versions of Popular Songs for Music Document Retrieval , 2008, J. Inf. Sci. Eng..

[40]  Bryan Pardo,et al.  REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  J. Sundberg,et al.  The Science of Singing Voice , 1987 .

[42]  Jyh-Shing Roger Jang,et al.  Singing Pitch Extraction by Voice Vibrato / Tremolo Estimation and Instrument Partial Deletion , 2010, ISMIR.

[43]  Jody Kreiman,et al.  Foundations of Voice Studies: An Interdisciplinary Approach to Voice Production and Perception , 2011 .

[44]  Masataka Goto,et al.  A Real-time Music Scene Description System: Detecting Melody and Bass Lines in Audio Signals , 1999 .

[45]  Gaël Richard,et al.  An iterative approach to monaural musical mixture de-soloing , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[46]  Gaël Richard,et al.  A Musically Motivated Mid-Level Representation for Pitch Estimation and Musical Audio Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[47]  Amílcar Cardoso,et al.  Melody Detection in Polyphonic Musical Signals: Exploiting Perceptual Rules, Note Salience, and Melodic Smoothness , 2006, Computer Music Journal.

[48]  Emilia Gómez,et al.  Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[49]  Jordi Bonada,et al.  Predominant Fundamental Frequency Estimation vs Singing Voice Separation for the Automatic Transcription of Accompanied Flamenco Singing , 2012, ISMIR.