We attack a woefully under-explored language genre--lyrics in music--introducing a novel hidden Markov model based method for completely unsupervised identifica-tion of rhyme schemes in hip hop lyrics, which to the best of our knowledge, is the first such effort. Unlike previous approaches that use supervised or semi-supervised approaches for the task of rhyme scheme identification, our model does not assume any prior phonetic or labeling information whatsoever. Also, unlike previous work on rhyme scheme identification, we attack the difficult task of hip hop lyrics in which the data is more highly unstructured and noisy. A novel feature of our approach comes from the fact that we do not manually segment the verses in lyrics according to any pre-specified rhyme scheme, but instead use a number of hidden states of varying rhyme scheme lengths to automatically impose a soft segmentation. In spite of the level of difficulty of the challenge, we nevertheless were able to obtain a surprisingly high precision of 35.81% and recall of 57.25% on the task of identifying the rhyming words, giving a total f-score of 44.06%. These encouraging results were obtained in the face of highly noisy data, lack of clear stanza segmentation, and a very wide variety of rhyme schemes used in hip hop.
[1]
Kevin Knight,et al.
Unsupervised Discovery of Rhyme Schemes
,
2011,
ACL.
[2]
Long Jiang,et al.
Generating Chinese Couplets using a Statistical MT Approach
,
2008,
COLING.
[3]
Jakob Uszkoreit,et al.
“Poetic” Statistical Machine Translation: Rhyme and Meter
,
2010,
EMNLP.
[4]
Kevin Knight,et al.
Automatic Analysis of Rhythmic Poetry with Applications to Generation and Translation
,
2010,
EMNLP.
[5]
Morgan Sonderegger,et al.
Applications of graph theory to an English rhyming corpus
,
2011,
Comput. Speech Lang..
[6]
Sankar Kuppan,et al.
Automatic Generation of Tamil Lyrics for Melodies
,
2009
.
[7]
Jr. G. Forney,et al.
The viterbi algorithm
,
1973
.