A Lightweight Music Texture Transfer System

Deep learning researches on the transformation problems for image and text have raised great attention. However, present methods for music feature transfer using neural networks are far from practical application. In this paper, we initiate a novel system for transferring the texture of music, and release it as an open source project. Its core algorithm is composed of a converter which represents sounds as texture spectra, a corresponding reconstructor and a feed-forward transfer network. We evaluate this system from multiple perspectives, and experimental results reveal that it achieves convincing results in both sound effects and computational performance.

[1]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[2]  Frank Nielsen,et al.  DeepBach: a Steerable Model for Bach Chorales Generation , 2016, ICML.

[3]  Jean-Jacques E. Slotine,et al.  Audio classification from time-frequency texture , 2008, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[5]  Cha Zhang,et al.  CROWDMOS: An approach for crowdsourcing mean opinion score studies , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Dongyan Zhao,et al.  Style Transfer in Text: Exploration and Evaluation , 2017, AAAI.

[7]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Xavier Bresson,et al.  FMA: A Dataset for Music Analysis , 2016, ISMIR.

[9]  J. Moran,et al.  Sensation and perception , 1980 .

[10]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[11]  Youngmoo E. Kim,et al.  “Style” Transfer for Musical Audio Using Multiple Time-Frequency Representations , 2018 .

[12]  Lonce L. Wyse,et al.  Audio Spectrogram Representations for Processing with Convolutional Neural Networks , 2017, ArXiv.

[13]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[14]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[15]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Harsh Jhamtani,et al.  Shakespearizing Modern Language Using Copy-Enriched Sequence to Sequence Models , 2017, Proceedings of the Workshop on Stylistic Variation.

[17]  Parag K. Mital,et al.  Time Domain Neural Audio Style Transfer , 2017, ArXiv.

[18]  Eero P. Simoncelli,et al.  Article Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis , 2022 .

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  Carl Henrik Ek,et al.  Neural Translation of Musical Style , 2017, ArXiv.

[21]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[22]  Lior Wolf,et al.  A Universal Music Translation Network , 2018, ICLR.