Automatic conversion from speech to rap music

Speech-to-music conversion has become more and more popular in recent years. However, existing approaches cannot be directly applied for automatic conversion from speech to rap, because rap is a special music genre that contains many stressed syllables aligned with beats of the music accompaniment. This paper presents the first speech-to-rap system. The system first applies forced alignment to both rap acapellas and speech with the same lyrics to obtain word segments, which are then used to compute the conversion factors for prosodie features such as pitch and duration. Then we employ a phase vocoder to convert the original speech based on the rap acapella's pitch and duration. After that, the rhythmic effect is added to the synthesized rap acapella according to the detected beat information via a beat tracking algorithm. Finally, the synthesized result is combined with the accompaniment track to form a rap song. A subjective test of mean opinion scores given by 22 subjects indicates an average score of 3.3 out of 5 possible points, demonstrating the feasibility (but still with room for improvement) of the proposed approach.

[1]  Heiga Zen,et al.  An HMM-based singing voice synthesis system , 2006, INTERSPEECH.

[2]  Jyh-Shing Roger Jang,et al.  A Two-Fold Dynamic Programming Approach to Beat Tracking for Audio Music with Time-Varying Tempo , 2011, ISMIR.

[3]  Masataka Goto,et al.  Speech-to-Singing Synthesis: Converting Speaking Voices to Singing Voices by Controlling Acoustic Features Unique to Singing Voices , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[4]  Bin Ma,et al.  Voice conversion: From spoken vowels to singing vowels , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[5]  Daniel Gärtner Singing / Rap Classification of Isolated Vocal Tracks , 2010, ISMIR.

[6]  Levent M. Arslan,et al.  Application of voice conversion for cross-language rap singing transformation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Masataka Goto,et al.  Vocalistener2: A singing synthesis system able to mimic a user's singing in terms of voice timbre changes as well as pitch and dynamics , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  J. Bonada,et al.  Synthesis of the Singing Voice by Performance Sampling and Spectral Models , 2007, IEEE Signal Processing Magazine.