LyDAR: A LYrics Density based Approach to non-homogeneous music Resizing

In many scenarios, such as TV/radio advertising production, animation production, and presentation, music pieces are constrained in the metric of time. For example, an editor wants to use a 320s song to fit a 280s animation or to accompany a 265s radio advertisement. Current music resizing approach scales the whole piece of music in a uniform manner. However, it will degrade the effect of the compressed song and make perceptual artifacts. In this paper, a novel music resizing approach, called LyDAR (LYrics Density based Approach to non-homogeneous music Resizing), is proposed, in which the resizing operation is guided by music structural analysis. Firstly, a useful concept, lyrics density, is presented, which takes advantage of lyrics to analyze the musical structure and can be used to describe the compression-resistance for different parts of a song. Secondly, two music resizing scheduling algorithms, LDF and LDGF, are developed to schedule compression over different parts of a music piece. Finally, both subjective and objective experiments are conducted to show that LyDAR can effectively and efficiently generate compressed versions of songs with good quality.

[1]  Jean Laroche,et al.  Improved phase vocoder time-scale modification of audio , 1999, IEEE Trans. Speech Audio Process..

[2]  Werner Verhelst,et al.  An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Yizhar Lavner,et al.  Time-Scale Modification of Audio Signals Using Enhanced WSOLA With Management of Transients , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Mark Dolson,et al.  The Phase Vocoder: A Tutorial , 1986 .