Learning Similarity Metrics for Melody Retrieval

Similarity measures are indispensable in music information retrieval. In recent years, various proposals have been made for measuring melodic similarity in symbolically encoded scores. Many of these approaches are ultimately based on a dynamic programming approach such as sequence alignment or edit distance, which has various drawbacks. First, the similarity scores are not necessarily metrics and are not directly comparable. Second, the algorithms are mostly first-order and of quadratic time-complexity, and finally, the features and weights need to be defined precisely. We propose an alternative approach which employs deep neural networks for end-to-end similarity metric learning. We contrast and compare different recurrent neural architectures (LSTM and GRU) for representing symbolic melodies as continuous vectors, and demonstrate how duplet and triplet loss functions can be employed to learn compact distributional representations of symbolic music in an induced melody space. This approach is contrasted with an alignment-based approach. We present results for the Meertens Tune Collections, which consists of a large number of vocal and instrumental monophonic pieces from Dutch musical sources, spanning five centuries, and demonstrate the robustness of the learned similarity metrics.

[1]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[2]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[3]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[4]  David Sankoff,et al.  Comparison of musical sequences , 1990, Comput. Humanit..

[5]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[7]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[8]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[9]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[10]  F. Wiering,et al.  A Comparison between Global and Local Features for Computational Classification of Folk Song Melodies , 2013 .

[11]  Richard Socher,et al.  An Analysis of Neural Language Modeling at Multiple Scales , 2018, ArXiv.

[12]  Frans Wiering,et al.  The Meertens Tune Collections , 2014 .

[13]  Ning Hu,et al.  A comparative evaluation of search techniques for query-by-humming using the MUSART testbed , 2007, J. Assoc. Inf. Sci. Technol..

[14]  Mathieu Giraud,et al.  Fragmentations with Pitch, Rhythm and Parallelism Constraints for Variation Matching , 2013, CMMR.

[15]  Maarten Versteegh,et al.  Learning Text Similarity with Siamese Recurrent Networks , 2016, Rep4NLP@ACL.

[16]  Daniel Müllensiefen,et al.  QUANTITATIVE EVALUATION OF MUSIC COPYRIGHT INFRINGEMENT , 2018 .

[17]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[18]  Velankar Makarand,et al.  Unified Algorithm for Melodic Music Similarity and Retrieval in Query by Humming , 2018 .

[19]  Ramon López de Mántaras,et al.  Melody retrieval using the Implication/Realization Model , 2005 .

[20]  Quentin D. Atkinson,et al.  Automatic Tune Family Identification by Musical Sequence Alignment , 2015, ISMIR.

[21]  Peter van Kranenburg,et al.  Documenting a song culture: the Dutch Song Database as a resource for musicological research , 2019, International Journal on Digital Libraries.

[22]  Andreas Kornstädt,et al.  Themefinder: A web-based melodic search tool , 1998 .

[23]  P. van Kranenburg,et al.  A Computational Approach to Content-Based Retrieval of Folk Song Melodies , 2010 .

[24]  Juan Llorens Morillo,et al.  Melodic Similarity through Shape Similarity , 2010, CMMR.

[25]  Masataka Goto,et al.  Comparing RNN Parameters for Melodic Similarity , 2018, ISMIR.

[26]  Mauro Vallati,et al.  Symbolic Melodic Similarity: State of the Art and Future Challenges , 2016, Computer Music Journal.

[27]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Anja Volk,et al.  Finding Occurrences of Melodic Segments in Folk Songs Employing Symbolic Similarity Measures , 2017 .

[29]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  B. Bronson,et al.  Prolegomena to a Study of the Principal Melodic Families of British-American Folk Song , 1950 .

[32]  Emilia Gómez,et al.  Tonal representations for music retrieval: from version identification to query-by-humming , 2012, International Journal of Multimedia Information Retrieval.

[33]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[34]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).