Melody pitch estimation based on range estimation and candidate extraction using harmonic structure model

This paper proposes an algorithm to estimate the melody pitch line (the most dominant pitch sequence) of a given polyphonic audio based on melody range estimation and pitch candidate extraction using a harmonic structure model similar to that proposed by Goto. This paper defines melody pitch candidate as a list of pitch candidates that produces the best-fit harmonic models to the polyphonic audio. In many melody extraction algorithms proposed in the past, multiple-pitch extractor (MPE) is often performed for extracting melody pitch candidates; however, the MPE serves the purpose of estimating all pitches within a frame of a polyphonic audio and does not necessarily provide melody pitch candidates. The estimated weights of the harmonic structure model which must be obtained for extracting the pitch candidates are liable to octave error and strong low frequency interference, and therefore, certain refinement after the estimation must be performed. As a refinement, the algorithm measures the degree of harmonic fitness of each candidate. Furthermore, a melody pitch range is estimated to reduce false-positive pitch candidates. The melody pitch range is estimated based on the distribution of the best pitch candidates with long duration. Experimental results show that the proposed extraction algorithm performed better than many of the algorithms proposed in the past.