论文信息 - Deep Learning Models for Melody Perception: An Investigation on Symbolic Music Data - 字舞流文

Deep Learning Models for Melody Perception: An Investigation on Symbolic Music Data

We investigate the deep learning approaches on the melody extraction problem on symbolic music data. Specifically, we compare two different approaches: the first one employs recurrent neural networks (RNN) by considering melody extraction as a sequence prediction problem, while the second employs fully convolutional networks (FCN) by considering it as a image semantic segmentation problem. Both methods are tested against a MIDI dataset with melody tracks acting as ground truth. A more challenging case that the melodies are shifted by one octave is also considered. Evaluation results show the advantage of the semantic segmentation approach in terms of the accuracy.

Li Su | Wei Tsung Lu | W. Lu | Li Su

[1] Sangeun Kum,et al. Melody Extraction on Vocal Segments Using Multi-Column Deep Neural Networks , 2016, ISMIR.

[2] Andrew McLeod,et al. HMM-Based Voice Separation of MIDI Performance , 2016 .

[3] M. R. Jones. Dynamic pattern structure in music: Recent theory and research , 1987, Perception & psychophysics.

[4] Emilio Molina,et al. Evaluation Framework for Automatic Singing Transcription , 2014, ISMIR.

[5] Diana Deutsch. An illusion with musical scales , 1974 .

[6] Emilia Gómez,et al. Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[7] Frank Nielsen,et al. DeepBach: a Steerable Model for Bach Chorales Generation , 2016, ICML.

[8] Prateek Verma,et al. Frequency Estimation from Waveforms Using Multi-Layered Neural Networks , 2016, INTERSPEECH.

[9] George Papandreou,et al. Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[10] Yannis Manolopoulos,et al. MUSICAL VOICE INTEGRATION/SEGREGATION: VISAREVISITED , 2009 .

[11] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12] Ryo Nishikimi,et al. Musical Note Estimation for F0 Trajectories of Singing Voices Based on a Bayesian Semi-Beat-Synchronous HMM , 2016, ISMIR.

[13] Masataka Goto,et al. A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[14] Elaine Chew,et al. Separating Voices in Polyphonic Music: A Contig Mapping Approach , 2004, CMMR.

[15] Justin Salamon,et al. Deep Salience Representations for F0 Estimation in Polyphonic Music , 2017, ISMIR.

[16] David Temperley,et al. A Probabilistic Model of Melody Perception , 2008, ISMIR.

[17] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[18] Nicolas Guiomard-Kagan,et al. Improving Voice Separation by Better Connecting Contigs , 2016, ISMIR.

[19] Craig Stuart Sapp,et al. Search Effectiveness Measures for Symbolic Music Queries in Very Large Databases , 2004, ISMIR.

[20] JUSTIN,et al. Pitch Analysis for Active Music Discovery , 2016 .

[21] Slim Essid,et al. Melody Extraction by Contour Classification , 2015, ISMIR.

[22] Nicola Orio,et al. Musical information retrieval using melodic surface , 1999, DL '99.

[23] Dimos Makris. VISA: REFINING THE VOICE INTEGRATION/SEGREGATION ALGORITHM , 2018 .

[24] Li Su,et al. Vocal Melody Extraction with Semantic Segmentation and Audio-symbolic Domain Transfer Learning , 2018, ISMIR.

[25] Brian Christopher Smith,et al. Query by humming: musical information retrieval in an audio database , 1995, MULTIMEDIA '95.

[26] Razvan C. Bunescu,et al. A Neural Greedy Model for Voice Separation in Symbolic Music , 2016, ISMIR.

[27] Daniel P. W. Ellis,et al. MIR_EVAL: A Transparent Implementation of Common MIR Metrics , 2014, ISMIR.

[28] Nicolas Guiomard-Kagan,et al. Comparing Voice and Stream Segmentation Algorithms , 2015, ISMIR.

[29] W. Jay Dowling,et al. Expectancy and attention in melody perception. , 1990 .

[30] Simon Dixon,et al. Computer-aided Melody Note Transcription Using the Tony Software: Accuracy and Efficiency , 2015 .

[31] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[32] Jordan B. L. Smith,et al. Probabilistic transcription of sung melody using a pitch dynamic model , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33] François Rigaud,et al. Singing Voice Melody Transcription Using Deep Neural Networks , 2016, ISMIR.

[34] C. Chuan. Tone and Voice: A Derivation of the Rules of Voice-Leading from Perceptual Principles , 2001 .

[35] George Papandreou,et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[36] Yannis Manolopoulos,et al. Horizontal and Vertical Integration/Segregation in Auditory Streaming: A Voice Separation Algorithm for Symbolic Musical Data , 2007 .