DEEP LEARNING AND INTELLIGENT AUDIO MIXING

Mixing multitrack audio is a crucial part of music production. With recent advances in machine learning techniques such as deep learning, it is of great importance to conduct research on the applications of these methods in the field of automatic mixing. In this paper, we present a survey of intelligent audio mixing systems and their recent incorporation of deep neural networks. We propose to the community a research trajectory in the field of deep learning applied to intelligent music production systems. We conclude with a proof of concept based on stem audio mixing as a contentbased transformation using a deep autoencoder.

[1]  Joshua D. Reiss,et al.  An Analysis and Evaluation of Audio Features for Multitrack Music Mixtures , 2014, ISMIR.

[2]  Gaëtan Hadjeres,et al.  Style Imitation and Chord Invention in Polyphonic Music with Exponential Families , 2016, ArXiv.

[3]  Bob L. Sturm,et al.  Folk music style modelling by recurrent neural networks with long short term memory units , 2015 .

[4]  Jordi Bonada,et al.  A Neural Parametric Singing Synthesizer , 2017, INTERSPEECH.

[5]  György Fazekas,et al.  The Open Multitrack Testbed , 2014 .

[6]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[7]  Tuomas Virtanen,et al.  Deep Neural Networks for Dynamic Range Compression in Mastering Applications , 2016 .

[8]  Udo Zölzer,et al.  Adaptive digital audio effects (a-DAFx): a new class of sound transformations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Ad Wilson,et al.  An evolutionary computation approach to intelligent music production informed by experimentally gathered domain knowledge , 2016 .

[10]  Joshua D. Reiss,et al.  Intelligent Audio Production Strategies Informed by Best Practices , 2014, Semantic Audio.

[11]  Matthias Mauch,et al.  MedleyDB: A Multitrack Dataset for Annotation-Intensive MIR Research , 2014, ISMIR.

[12]  Karen Simonyan,et al.  Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders , 2017, ICML.

[13]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[14]  Mark D. Plumbley,et al.  Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network , 2015, LVA/ICA.

[15]  Mark D. Plumbley,et al.  MUSIC REMIXING AND UPMIXING USING SOURCE SEPARATION , 2016 .

[16]  Jeffrey J. Scott,et al.  AUTOMATIC MULTI-TRACK MIXING USING LINEAR DYNAMICAL SYSTEMS , 2011 .

[17]  Dale Reed A perceptual assistant to do sound equalization , 2000, IUI '00.

[18]  Simon Dixon,et al.  Improved music feature learning with deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Bruno Fazenda,et al.  Variation in Multitrack Mixes: Analysis of Low-level Audio Signal Features , 2016 .

[20]  Simon Dixon,et al.  An End-to-End Neural Network for Polyphonic Piano Music Transcription , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[21]  Douglas Eck,et al.  Tuning Recurrent Neural Networks with Reinforcement Learning , 2016, ICLR.

[22]  François Pachet,et al.  A Joyful Ode to Automatic Orchestration , 2016, ACM Trans. Intell. Syst. Technol..

[23]  Brecht De Man,et al.  Towards a better understanding of mix engineering , 2017 .

[24]  Gerald Schuller,et al.  New Sonorities for Jazz Recordings: Separation and Mixing using Deep Neural Networks , 2016 .

[25]  Ye Wang,et al.  Improving Content-based and Hybrid Music Recommendation using Deep Learning , 2014, ACM Multimedia.

[26]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[27]  Yoshua Bengio,et al.  SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.

[28]  Jordi Bonada,et al.  Content-based transformations , 2003 .

[29]  Daniele Barchiesi,et al.  Reverse Engineering of a Mix , 2010 .

[30]  Joshua D. Reiss,et al.  Intelligent systems for mixing multichannel audio , 2011, 2011 17th International Conference on Digital Signal Processing (DSP).

[31]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[32]  Joshua D. Reiss,et al.  Automatic subgrouping of multitrack audio , 2015 .