Using Shared Vector Representations of Words and Chords in Music for Genre Classification

With so much music readily available for consumption today, it has never been more important to study music perception. In this paper, we represent lyrics and chords in a shared vector space using a phrase-aligned lyrics-and-chords corpus and show that models that use these shared representations can predict musical genre of songs—a perceptual construct of music listening—better than models that do not use these representations. This work adds to our understanding of how lyrics and chords interact with one another in music and has applications in multimodal perception and music information retrieval.

[1]  P. Kay,et al.  What is the Sapir-Whorf hypothesis? , 1983 .

[2]  Tao Li,et al.  A comparative study on content-based music genre classification , 2003, SIGIR.

[3]  Johan Pauwels,et al.  Evaluating automatically estimated chord sequences , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Mark B. Sandler,et al.  Key Estimation Using a Hidden Markov Model , 2006, ISMIR.

[5]  Saso Dzeroski,et al.  Decision Trees for Hierarchical Multilabel Classification: A Case Study in Functional Genomics , 2006, PKDD.

[6]  H. Abdi,et al.  Principal component analysis , 2010 .

[7]  Yi-Hsuan Yang,et al.  Automatic chord recognition for music classification and retrieval , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[8]  N. Scaringella,et al.  Automatic genre classification of music content: a survey , 2006, IEEE Signal Process. Mag..

[9]  Shrikanth Narayanan,et al.  Learning Shared Vector Representations of Lyrics and Chords in Music , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Zhi Liu,et al.  Multi-valued attribute and multi-labeled data decision tree algorithm , 2011, Int. J. Mach. Learn. Cybern..

[11]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[12]  Jens Grivolla,et al.  Multimodal Music Mood Classification Using Audio and Lyrics , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[13]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[14]  Stephen McAdams,et al.  A Comparison of Approaches to Timbre Descriptors in Music Information Retrieval and Music Psychology , 2016 .

[15]  Andreas Rauber,et al.  Combination of audio and lyrics features for genre classification in digital audio collections , 2008, ACM Multimedia.

[16]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[17]  A. Friederici,et al.  Investigating emotion with music: An fMRI study , 2006, Human brain mapping.

[18]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[19]  John Z. Zhang,et al.  Enhancing multi-label music genre classification through ensemble techniques , 2011, SIGIR.

[20]  Andreas Rauber,et al.  An Audio-Visual Approach to Music Genre Classification through Affective Color Features , 2015, ECIR.

[21]  Jordan B. L. Smith,et al.  Evaluating the Genre Classification Performance of Lyrical Features Relative to Audio, Symbolic and Cultural Features , 2010, ISMIR.

[22]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[23]  Andreas Rauber,et al.  Rhyme and Style Features for Musical Genre Classification by Song Lyrics , 2008, ISMIR.

[24]  Yoshua Bengio,et al.  BilBOWA: Fast Bilingual Distributed Representations without Word Alignments , 2014, ICML.

[25]  Krzysztof Z. Gajos,et al.  ChordRipple: Recommending Chords to Help Novice Composers Go Beyond the Ordinary , 2016, IUI.

[26]  H. Robbins A Stochastic Approximation Method , 1951 .

[27]  Y. Song,et al.  A Survey of Music Recommendation Systems and Future Perspectives , 2012 .

[28]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[29]  Christopher D. Manning,et al.  Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.

[30]  Rafael Ramírez,et al.  Genre Classification Using Harmony Rules Induced from Automatic Chord Transcriptions , 2009, ISMIR.