Automatic Realistic Music Video Generation from Segments of Youtube Videos

A Music Video (MV) is a video aiming at visually illustrating or extending the meaning of its background music. This paper proposes a novel method to automatically generate, from an input music track, a music video made of segments of Youtube music videos which would fit this music. The system analyzes the input music to find its genre (pop, rock, ...) and finds segmented MVs with the same genre in the database. Then, a K-Means clustering is done to group video segments by color histogram, meaning segments of MVs having the same global distribution of colors. A few clusters are randomly selected, then are assembled around music boundaries, which are moments where a significant change in the music occurs (for instance, transitioning from verse to chorus). This way, when the music changes, the video color mood changes as well. This work aims at generating high-quality realistic MVs, which could be mistaken for man-made MVs. By asking users to identify, in a batch of music videos containing professional MVs, amateur-made MVs and generated MVs by our algorithm, we show that our algorithm gives satisfying results, as 45% of generated videos are mistaken for professional MVs and 21.6% are mistaken for amateur-made MVs. More information can be found in the project website: this http URL

[1]  Hsin-Min Wang,et al.  Automatic Music Video Generation Based on Simultaneous Soundtrack Recommendation and Video Editing , 2017, ACM Multimedia.

[2]  Hsin-Min Wang,et al.  Automatic Music Video Generation Based on Emotion-Oriented Pseudo Song Prediction and Matching , 2016, ACM Multimedia.

[3]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[4]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[5]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[6]  Andreas Girgensohn,et al.  Creating music videos using automatic media analysis , 2002, MULTIMEDIA '02.

[7]  Maosong Sun,et al.  Generating Chinese Classical Poems with RNN Encoder-Decoder , 2016, CCL.

[8]  A. Goodwin,et al.  Dancing in the Distraction Factory: Music Television and Popular Culture , 1992 .

[9]  Siwoo Byun,et al.  Automated Music Video Generation Using Multi-level Feature-based Segmentation , 2009, Handbook of Multimedia for Digital Entertainment and Arts.

[10]  Daniel P. W. Ellis,et al.  Learning to segment songs with ordinal linear discriminant analysis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Mark Sandler,et al.  Transfer Learning for Music Classification and Regression Tasks , 2017, ISMIR.

[12]  Tao Mei,et al.  To Create What You Tell: Generating Videos from Captions , 2017, ACM Multimedia.

[13]  Wei-Ying Ma,et al.  Automated Music Video Generation using WEB Image Resource , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[14]  Lie Lu,et al.  Automatic music video generation based on temporal pattern analysis , 2004, MULTIMEDIA '04.

[15]  Pengfei Zhu,et al.  Video shot segmentation using graph-based dominant-set clustering , 2009, ICIMCS '09.

[16]  Apostol Natsev,et al.  YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[17]  Yitong Li,et al.  Video Generation From Text , 2017, AAAI.

[18]  Yu Qiao,et al.  Automatic music video generation: cross matching of music and image , 2012, ACM Multimedia.

[19]  Frank Nielsen,et al.  DeepBach: a Steerable Model for Bach Chorales Generation , 2016, ICML.

[20]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[21]  Bo Zhang,et al.  A Formal Study of Shot Boundary Detection , 2007, IEEE Transactions on Circuits and Systems for Video Technology.