Study of automatic melody extraction methods for Philippine indigenous music

In this study, we compared two methods for extracting the melody pitch from select Philippine indigenous music. Pitch is expressed as the fundamental frequency of the main melodic voice or lead instrument of a music sample. Our implementation of automatic melody extraction involves blind source separation and pitch detection. For blind source separation, we implemented the Harmonic-Percussive Source Separation (HPSS) algorithm and the Shifted Non-negative Matrix Factorization (SNMF) algorithm. The HPSS algorithm identifies the harmonic component from the prominent peaks in the spectrogram of a signal while the SNMF algorithm use timbre as criterion. The harmonic component is used to estimate the melody pitch. The HPSS and SNMF source separation algorithms are complemented with salience-based and data driven pitch detection algorithms, respectively. The two systems are evaluated using ten samples of Philippine indigenous music. After source separation, the estimated harmonic and percussive tracks were evaluated through subjective listening tests. Results from subjective tests show that SNMF perform better than HPSS for harmonic and percussive source separation. Moreover, objective tests using standard metrics indicate that the salience-based approach has higher accuracy in identifying the melody than the data driven approach.

[1]  Tamara G. Kolda,et al.  Categories and Subject Descriptors: G.4 [Mathematics of Computing]: Mathematical Software— , 2022 .

[2]  Jyh-Shing Roger Jang,et al.  Singing Pitch Extraction by Voice Vibrato / Tremolo Estimation and Instrument Partial Deletion , 2010, ISMIR.

[3]  Simon Dixon,et al.  PYIN: A fundamental frequency estimator using probabilistic threshold distributions , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Jing Liu,et al.  SVM-Based Automatic Classification of Musical Instruments , 2010, 2010 International Conference on Intelligent Computation Technology and Automation.

[5]  Changsheng Xu,et al.  Automatic music classification and summarization , 2005, IEEE Transactions on Speech and Audio Processing.

[6]  D. Fitzgerald,et al.  Shifted non-negative matrix factorisation for sound source separation , 2005, IEEE/SP 13th Workshop on Statistical Signal Processing, 2005.

[7]  Shigeki Sagayama,et al.  Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Hirokazu Kameoka,et al.  Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram , 2008, 2008 16th European Signal Processing Conference.

[9]  Jyh-Shing Roger Jang,et al.  A hybrid approach to singing pitch extraction based on trend estimation and hidden Markov models , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Daniel P. W. Ellis,et al.  A Classification Approach to Melody Transcription , 2005, ISMIR.

[11]  Rémi Gribonval,et al.  From Blind to Guided Audio Source Separation: How models and side information can improve the separation of sound , 2014, IEEE Signal Processing Magazine.

[12]  O. Lartillot,et al.  A MATLAB TOOLBOX FOR MUSICAL FEATURE EXTRACTION FROM AUDIO , 2007 .

[13]  Petri Toiviainen,et al.  MIR in Matlab (II): A Toolbox for Musical Feature Extraction from Audio , 2007, ISMIR.

[14]  Shigeo Wada,et al.  Melody and bass line estimation method using audio feature database , 2011, 2011 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC).

[15]  Thierry Dutoit,et al.  A comparative study of pitch extraction algorithms on a large variety of singing sounds , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  J. L. Flanagan,et al.  PHASE VOCODER , 2008 .

[17]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[18]  Stephen Cranefield,et al.  A Study on Feature Analysis for Musical Instrument Classification , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Derry Fitzgerald,et al.  ON THE USE OF MASKING FILTERS IN SOUND SOURCE SEPARATION , 2012 .

[20]  Antoine Liutkus,et al.  Probabilistic model for main melody extraction using Constant-Q transform , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  M. Marolt ON FINDING MELODIC LINES IN AUDIO RECORDINGS , 2004 .

[22]  Dan Barry,et al.  Clustering NMF basis functions using Shifted NMF for monaural sound source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[24]  Amílcar Cardoso,et al.  Melody Detection in Polyphonic Musical Signals: Exploiting Perceptual Rules, Note Salience, and Melodic Smoothness , 2006, Computer Music Journal.

[25]  Emilia Gómez,et al.  Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics , 2012, IEEE Transactions on Audio, Speech, and Language Processing.