In many scenarios, such as TV/radio advertising production, animation production, and presentation, music pieces are constrained in the metric of time. For example, an editor wants to use a 320s song to fit a 280s animation or to accompany a 265s radio advertisement. Current music resizing approach scales the whole piece of music in a uniform manner. However, it will degrade the effect of the compressed song and make perceptual artifacts. In this paper, a novel music resizing approach, called LyDAR (LYrics Density based Approach to non-homogeneous music Resizing), is proposed, in which the resizing operation is guided by music structural analysis. Firstly, a useful concept, lyrics density, is presented, which takes advantage of lyrics to analyze the musical structure and can be used to describe the compression-resistance for different parts of a song. Secondly, two music resizing scheduling algorithms, LDF and LDGF, are developed to schedule compression over different parts of a music piece. Finally, both subjective and objective experiments are conducted to show that LyDAR can effectively and efficiently generate compressed versions of songs with good quality.
[1]
Jean Laroche,et al.
Improved phase vocoder time-scale modification of audio
,
1999,
IEEE Trans. Speech Audio Process..
[2]
Werner Verhelst,et al.
An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech
,
1993,
1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[3]
Yizhar Lavner,et al.
Time-Scale Modification of Audio Signals Using Enhanced WSOLA With Management of Transients
,
2008,
IEEE Transactions on Audio, Speech, and Language Processing.
[4]
Mark Dolson,et al.
The Phase Vocoder: A Tutorial
,
1986
.