From context to concept: exploring semantic relationships in music with word2vec

We explore the potential of a popular distributional semantics vector space model, word2vec , for capturing meaningful relationships in ecological (complex polyphonic) music. More precisely, the skip-gram version of word2vec is used to model slices of music from a large corpus spanning eight musical genres. In this newly learned vector space, a metric based on cosine distance is able to distinguish between functional chord relationships, as well as harmonic associations in the music. Evidence, based on cosine distance between chord-pair vectors, suggests that an implicit circle-of-fifths exists in the vector space. In addition, a comparison between pieces in different keys reveals that key relationships are represented in word2vec space. These results suggest that the newly learned embedded vector representation does in fact capture tonal and harmonic characteristics of music, without receiving explicit information about the musical content of the constituent slices. In order to investigate whether proximity in the discovered space of embeddings is indicative of ‘semantically-related’ slices, we explore a music generation task, by automatically replacing existing slices from a given piece of music with new slices. We propose an algorithm to find substitute slices based on spatial proximity and the pitch class distribution inferred in the chosen subspace. The results indicate that the size of the subspace used has a significant effect on whether slices belonging to the same key are selected. In sum, the proposed word2vec model is able to learn music-vector embeddings that capture meaningful tonal and harmonic relationships in music, thereby providing a useful tool for exploring musical properties and comparisons across pieces, as a potential input representation for deep learning models, and as a music generation device.

[1]  Katrin Erk,et al.  Vector Space Models of Word Meaning and Phrase Meaning: A Survey , 2012, Lang. Linguistics Compass.

[2]  Ching-Hua Chuan,et al.  A Functional Taxonomy of Music Generation Systems , 2017, ACM Comput. Surv..

[3]  David Lewin,et al.  A Formal Theory of Generalized Tonal Functions , 1982 .

[4]  Dean P. Foster,et al.  Multi-View Learning of Word Embeddings via CCA , 2011, NIPS.

[5]  Jürgen Schmidhuber,et al.  Finding temporal structure in music: blues improvisation with LSTM recurrent networks , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[6]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[7]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[8]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[9]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[10]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[11]  Mark B. Sandler,et al.  Influences of Signal Processing, Tone Profiles, and Chord Progressions on a Model for Estimating the Musical Key from Audio , 2009, Computer Music Journal.

[12]  Daniel P. W. Ellis,et al.  A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..

[13]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[14]  Matthias Abend Cognitive Foundations Of Musical Pitch , 2016 .

[15]  Gerhard Widmer,et al.  A fully convolutional deep auditory model for musical chord recognition , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[16]  Krzysztof Z. Gajos,et al.  ChordRipple: Recommending Chords to Help Novice Composers Go Beyond the Ordinary , 2016, IUI.

[17]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[18]  Erik Cambria,et al.  Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis , 2015, EMNLP.

[19]  Marianne Kielian-Gilbert,et al.  Interpreting Musical Analogy: From Rhetorical Device to Perceptual Process , 1990 .

[20]  Ray Jackendoff,et al.  Toward a Formal Theory of Tonal Music , 1977 .

[21]  Roy Schwartz,et al.  Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction , 2015, CoNLL.

[22]  Matthew Purver,et al.  From Distributional Semantics to Conceptual Spaces: A Novel Computational Method for Concept Creation , 2015, J. Artif. Gen. Intell..

[23]  Carlos Eduardo Cancino Chacón,et al.  From Bach to the Beatles: The Simulation of Human Tonal Expectation Using Ecologically-Trained Predictive Models , 2017, ISMIR.

[24]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[25]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[26]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[27]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[28]  Ronan Collobert,et al.  Word Embeddings through Hellinger PCA , 2013, EACL.

[29]  Marcus T. Pearce,et al.  Information-Theoretic Properties of Auditory Sequences Dynamically Influence Expectation and Memory , 2018, Cogn. Sci..

[30]  Matthew Purver,et al.  Modeling metaphor perception with distributional semantics vector space models , 2016, C3GI@ESSLLI.

[31]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[32]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[33]  Geraint A. Wiggins,et al.  Auditory Expectation: The Information Dynamics of Music Perception and Cognition , 2012, Top. Cogn. Sci..

[34]  Carlos Eduardo Cancino Chacón,et al.  Developing Tonal Perception through Unsupervised Learning , 2014, ISMIR.

[35]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[36]  Ching-Hua Chuan,et al.  Modeling Temporal Tonal Relations in Polyphonic Music Through Deep Networks With a Novel Image-Based Representation , 2018, AAAI.

[37]  Yoshua Bengio,et al.  Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.

[38]  Carol L. Krumhansl A key-finding algorithm based on tonal hierarchies , 2001 .

[39]  Kenneth Sörensen,et al.  Generating structured music for bagana using quality metrics based on Markov models , 2015, Expert Syst. Appl..

[40]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[41]  W. Dowling Emotion and Meaning in Music , 2008 .

[42]  E. Chew Towards a mathematical model of tonality , 2000 .

[43]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[44]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[45]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[46]  S. Koelsch,et al.  Effects of musical expertise on the early right anterior negativity: an event-related brain potential study. , 2002, Psychophysiology.

[47]  Justyna Humięcka-Jakubowska,et al.  Sweet Anticipation : Music and , 2006 .

[48]  Kat Agres,et al.  Harmonics co-occurrences bootstrap pitch and tonality perception in music : Evidence from a statistical unsupervised learning model , 2015 .

[49]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[50]  HyvärinenAapo,et al.  Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics , 2012 .

[51]  D Schön,et al.  Comparison between Language and Music , 2001, Annals of the New York Academy of Sciences.

[52]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[53]  Ian H. Witten,et al.  Multiple viewpoint systems for music prediction , 1995 .

[54]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[55]  Elizabeth K. Johnson,et al.  Statistical learning of tone sequences by human infants and adults , 1999, Cognition.

[56]  Mark B. Sandler,et al.  Text-based LSTM networks for Automatic Music Composition , 2016, ArXiv.

[57]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .