Perceptual loss function for neural modeling of audio systems

This work investigates alternate pre-emphasis filters used as part of the loss function during neural network training for nonlinear audio processing. In our previous work, the errorto-signal ratio loss function was used during network training, with a first-order high-pass pre-emphasis filter applied to both the target signal and neural network output. This work considers more perceptually relevant pre-emphasis filters, which include low-pass filtering at high frequencies. We conducted listening tests to determine whether they offer an improvement to the quality of a neural network model of a guitar tube amplifier. Listening test results indicate that the use of an A-weighting pre-emphasis filter offers the best improvement among the tested filters. The proposed perceptual loss function improves the sound quality of neural network models in audio processing without affecting the computational cost.

[1]  Jyri Tapani Pakarinen,et al.  A Review of Digital Techniques for Modeling Vacuum-Tube Guitar Amplifiers , 2009, Computer Music Journal.

[2]  Vesa Välimäki,et al.  Emulation of Operational Amplifiers and Diodes in Audio Distortion Circuits , 2012, IEEE Transactions on Circuits and Systems II: Express Briefs.

[3]  Udo Zölzer,et al.  Gray-Box Modeling of Guitar Amplifiers , 2018 .

[4]  Robert W. Krug Sound Level Meters , 2007 .

[5]  Matti Karjalainen,et al.  Wave Digital Simulation of a Vacuum-Tube Amplifier , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Pierrick Lotton,et al.  CHEBYSHEV MODEL AND SYNCHRONIZED SWEPT SINE METHOD IN NONLINEAR AUDIO EFFECT MODELING , 2010 .

[7]  Andrew N. RIMELL,et al.  Design of digital filters for frequency weightings (A and C) required for risk assessments of workers exposed to noise , 2015, Industrial health.

[8]  Thomas Schmitz,et al.  Nonlinear Real-Time Emulation of a Tube Amplifier with a Long Short Time Memory Neural-Network , 2018 .

[9]  T.H. Crystal,et al.  Linear prediction of speech , 1977, Proceedings of the IEEE.

[10]  Michael Schoeffler,et al.  webMUSHRA — A Comprehensive Framework for Web-based Listening Tests , 2018 .

[11]  David L. Livingston,et al.  A Vacuum-Tube Guitar Amplifier Model Using Long/Short-Term Memory Networks , 2018, SoutheastCon 2018.

[12]  Vesa Välimäki,et al.  Time-variant gray-box modeling of a phaser pedal , 2016 .

[13]  Simone Orcioni,et al.  Identification of Volterra Models of Tube Audio Devices using Multiple-Variance Method , 2018, Journal of the Audio Engineering Society.

[14]  Lauri Juvela,et al.  Real-Time Modeling of Audio Distortion Circuits with Deep Learning , 2019 .

[15]  Lauri Juvela,et al.  Deep Learning for Tube Amplifier Emulation , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Augusto Sarti,et al.  An Improved and Generalized Diode Clipper Model for Wave Digital Filters , 2015 .

[17]  Sugato Chakravarty,et al.  Method for the subjective assessment of intermedi-ate quality levels of coding systems , 2001 .

[18]  Vesa Välimäki,et al.  Virtual Analog Effects , 2011 .

[19]  Augusto Sarti,et al.  NON-LINEAR DIGITAL IMPLEMENTATION OF A PARAMETRIC ANALOG TUBE GROUND CATHODE AMPLIFIER , 2007 .