Synthesizer Sound Matching with Differentiable DSP

While synthesizers have become commonplace in music production, many users find it difficult to control the parameters of a synthesizer to create the intended sound. In order to assist the user, the sound matching task aims to estimate synthesis parameters that produce a sound closest to the query sound. Recently, neural networks have been employed for this task. These neural networks are trained on paired data of synthesis parameters and the corresponding output sound, optimizing a loss of synthesis parameters. However, synthesis parameters are only indirectly correlated with the audio output. Another problem is that query made by the user usually consists of real-world sounds, different from the synthesizer output used during training. In this paper, we propose a novel approach to the problem of synthesizer sound matching by implementing a basic subtractive synthesizer using differentiable DSP modules. This synthesizer has interpretable controls and is similar to those used in music production. We can then train an estimator network by directly optimizing the spectral similarity of the synthesized output. Furthermore, we can train the network on real-world sounds whose ground-truth synthesis parameters are unavailable. We pre-train the network with parameter loss and fine-tune the model with spectral loss using real-world sounds. We show that the proposed method finds better matches compared to baseline models.

[1]  Bryan Pardo,et al.  SynthAssist: Querying an Audio Synthesizer by Vocal Imitation , 2014, NIME.

[2]  Lior Wolf,et al.  Hierarchical Timbre-Painting and Articulation Generation , 2020, ArXiv.

[3]  George Tzanetakis,et al.  One Billion Audio Sounds from GPU-Enabled Modular Synthesis , 2021, 2021 24th International Conference on Digital Audio Effects (DAFx).

[4]  S. Nercessian NEURAL PARAMETRIC EQUALIZER MATCHING USING DIFFERENTIABLE BIQUADS , 2020 .

[5]  Mark d'Inverno,et al.  Automatic Programming of VST Sound Synthesizers Using Deep Networks and Other Techniques , 2018, IEEE Transactions on Emerging Topics in Computational Intelligence.

[6]  Shunyu Yao,et al.  3D-Aware Scene Manipulation via Inverse Graphics , 2018, NeurIPS.

[7]  Philippe Pasquier,et al.  Automatic Synthesizer Preset Generation with PresetGen , 2016 .

[8]  Jong Wook Kim,et al.  Neural Music Synthesis for Flexible Timbre Control , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Fabian Esqueda,et al.  DIFFERENTIABLE IIR FILTERS FOR MACHINE LEARNING APPLICATIONS , 2020 .

[10]  Adrien Bardet,et al.  Universal audio synthesizer control with normalizing flows , 2019, ArXiv.

[11]  Karen Simonyan,et al.  Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders , 2017, ICML.

[12]  Chenjie Gu,et al.  DDSP: Differentiable Digital Signal Processing , 2020, ICLR.

[13]  Kristoffer Jensen,et al.  ENVELOPE MODEL OF ISOLATED MUSICAL SOUNDS , 1999 .

[14]  Katsutoshi Itoyama,et al.  Parameter Estimation of Virtual Musical Instrument Synthesizers , 2014, ICMC.

[15]  Curtis Hawthorne,et al.  Self-supervised Pitch Detection by Inverse Audio Synthesis , 2020 .

[16]  Diane J. Cook,et al.  A Survey of Unsupervised Deep Domain Adaptation , 2018, ACM Trans. Intell. Syst. Technol..

[17]  Nicolas Usunier,et al.  SING: Symbol-to-Instrument Neural Generator , 2018, NeurIPS.

[18]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Michael A. Casey,et al.  Musical Audio Synthesis Using Autoencoding Neural Nets , 2014, ICMC.

[20]  James W. Beauchamp,et al.  Machine Tongues XVI: Genetic Algorithms and Their Application to FM Matching Synthesis , 1993 .

[21]  Xavier Serra,et al.  Musical Sound Modeling with Sinusoids plus Noise , 1997 .

[22]  Carlos Agón,et al.  Multiobjective Time Series Matching for Audio Classification and Retrieval , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Paris Smaragdis,et al.  Differentiable Signal Processing With Black-Box Audio Effects , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).