Visualyre: Multimodal visualization of lyrics

In this paper, we present Visualyre, a web application that synthesizes images based on the semantics of the lyrics, and the mood of the music. We use a multimodal approach, generating initial images with a text-to-image generative models from the lyrics (text) of a song, followed by a style transfer model conditioned to the mood of the music (audio). Our target user base is the independent music artist community, by providing a means for composers and songwriters to generate suitable images for their music (such as album covers). We discuss the possible usage of such application, as well as the possible improvements in future iterations.

[1]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[2]  Ahmed M. Elgammal,et al.  CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms , 2017, ICCC.

[3]  Wei Chen,et al.  DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-To-Image Synthesis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Honglak Lee,et al.  Exploring the structure of a real-time, arbitrary neural artistic stylization network , 2017, BMVC.

[5]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[6]  C. Mohn,et al.  Perception of six basic emotions in music , 2011 .

[7]  Xavier Serra,et al.  Tensorflow Audio Models in Essentia , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Joan Serrà,et al.  Music Mood Annotator Design and Integration , 2009, 2009 Seventh International Workshop on Content-Based Multimedia Indexing.

[9]  Chuan Li,et al.  Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks , 2016, ECCV.

[10]  Perfecto Herrera,et al.  Mood Cloud : A Real-Time Music Mood Visualization Tool , 2008 .

[11]  Seah Hock Soon,et al.  Feature Guided Texture Synthesis (FGTS) for artistic style transfer , 2007, DIMEA.

[12]  M. F. Shiratuddin,et al.  Establishing a framework for visualizing music mood using visual texture , 2015 .

[13]  Xiaoou Tang,et al.  Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Seunghoon Hong,et al.  Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Keiichiro Hoashi,et al.  Automated Music Slideshow Generation Using Web Images Based on Lyrics , 2010, ISMIR.

[16]  R. Belton The Narrative Potential of Album Covers , 2015 .

[17]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[18]  Zhe Gan,et al.  AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Aaron Hertzmann Visual Indeterminacy in GAN Art , 2020, Leonardo.

[21]  Xiaogang Wang,et al.  StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Samuel B. Williams,et al.  ASSOCIATION FOR COMPUTING MACHINERY , 2000 .