GramError: A Quality Metric for Machine Generated Songs

This paper explores whether a simple grammar-based metric can accurately predict human opinion of machine-generated song lyrics squality. The proposed metric considers the percentage of words written in natural English and the number of grammatical errors to rate the quality of machine-generated lyrics. We use a state-of-the-art Recurrent Neural Network (RNN) model and adapt it to lyric generation by re-training on the lyrics of 5,000 songs. For our initial user trial, we use a small sample of songs generated by the RNN to calibrate the metric. Songs selected on the basis of this metric are further evaluated using “Turing-like” tests to establish whether there is a correlation between metric score and human judgment. Our results show that there is strong correlation with human opinion, especially at lower levels of song quality. They also show that 75% of the RNN-generated lyrics passed for human-generated over 30% of the time.