Time alignment for scenario and sounds with voice, music and BGM

This paper proposes a new time alignment method between scenario and sounds with voice, music and BGM (Back Ground Music) in order to generate video caption automatically. The proposed time alignment method, Voice-Music-Pause+BGM method, is based on the composition of voice and music models. The result of the experiments to evaluate the proposed method shows the proposed method works about 10 60 times better than the conventional time alignment methods.

[2]  Masahide Sugiyama Model based voice decomposition method , 2000, INTERSPEECH.

[3]  M. Sugiyama,et al.  Fast Music Retrieval using Spectrum and Power Information , 2001 .

[4]  Takahiro Suzuki,et al.  The latest achievement of VC project for automatic video caption generation , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[5]  Masahide Sugiyama,et al.  Design of Video Caption Markup Language VCML and development of VCML player , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).