论文信息 - INVERTING DYNAMICS COMPRESSION WITH MINIMAL SIDE INFORMATION

INVERTING DYNAMICS COMPRESSION WITH MINIMAL SIDE INFORMATION

ABSTRACTDynamics processing is a widespread technique, both at musicproduction and diffusion stages. In particular, dynamic compres-sion is often used in such a way that the “average” listener canbest enjoy the music. However, this may lead to an excessive useof compression, especially with respect to listeners in quiet lis-tening conditions. This paper presents estimates on the amount ofextra data that is needed to invert the effects of such non-linear pro-cessing, using simple blind identiﬁcation techniques. We presenttwo simple test cases, ﬁrst in the case when perfect reconstructionis needed, and second when the ancillary data rate is constrained,leading to an approximate reconstruction.1. INTRODUCTIONA common complaint amongst artists and sound engineers is thatdynamics compression (hereafter just called compression) is oftenbeing overly used. This is not only true at the production stage: asextra compression is almost systematically added by radio stations.There are many reasons for the use of compression: ﬁrst, in a noisyenvironment (e.g. listening to music in a car or on an iPod-typeportable device in the street), or with a constrained transmissionchannel (e.g. the maximum modulation for FM radios has to obeynational regulations, such physical limitations were also present onvinyl LPs), compressed music renders most of the music contentwithout having to constantly change the volume between soft andloud passages. Second, this gives a timbral identity to certain typesof music and / or radio stations. Third, sound engineers / producersuse high compression because they don’t want their music to sounddull compared to their competitors’ highly compressed music ; atypical vicious circle that, according to many, has gone too far.This feeling is shared by a number of listeners who like to listento music on medium- to high-end equipment, in low backgroundnoise levels. For these listeners, in large numbers though not themajority, there is the feeling that this is “too late”. Indeed, dueto the non-linearities of dynamics processing in ﬁnite precision,some information has been irreversibly lost. Setting a dynamicsexpander after a compression does not recover the original sound,and often results in the so-called “pumping” sound; which is oftendisgraceful when unintentional.Most of the techniques that have been proposed (see [1]) so farto weaken these effects are based on the idea that listeners wouldbe given the combination { original sound + processing parameters}: in this way each user can decide whether or not to apply dynam-ics processing. However, this idea has never caught up in practice.Amongst many explanations, it can be suggested that this is dueto a simple off-balance between the number of users and the extra“cost” (in terms of transmission / storage bandwidth) required forsound processing parameters: such a system imposes on everyonefor the beneﬁt of few. In other words, for the majority of peoplefor which high compression is useful / desirable, an extra amountof info is needed ; this system is perfectly designed (i.e. with nooverhead) only for the minority of “audiophiles”. Another draw-back of this system is that it often restricts the type of processingthat can be used, and/or allows the explicit transmission of all pro-cessing operations, which is often considered as a “trademark” ofa sound engineer / music label / radio station.In this paper, we investigate the opposite scheme: the datatransmitted / stored is { compressed sound + reverse-processingparameters }. The main advantage of such a scheme is that it isentirely backwards compatible: it doesn’t require any change forthe majority of users who are happy with the compressed sound,who can just ignore the extra parameters. The “hi-ﬁ” listeners,with properly equipped devices -and possibly higher transmission /storage bandwidth-, can choose to cancel the dynamics processingthanks to the extra set of parameters. Also, it is totally independentof the details of the processor used, since our system is based on a“blind” estimation of non-linearities [2]: the ancillary parametersare derived with no a priori knowledge on the processor, apart thefact that this is based on an instantaneous level-dependent gain. Inthis way, the ﬁne “trademark” details of the successive hardwareor software stages in studio compression techniques also remainhidden. The drawback of this technique is that the amount of an-cillary data for reverse-processing is a priori signiﬁcantly higherthan in the ﬁrst scenario.The goal of this paper is to present preliminary results thatquantify the amount of extra data for dynamics compression reverse-processing. This data is divided in two parts: ﬁrst an estimate ofthe instantaneous gain, that has to be subsampled and quantizedat ﬁnite precision. Second, using this approximate gain one onlygets an estimate of the original signal: one should encode alsothe residual between estimated ant true original. Two test caseshave been studied. In the “lossless” scenario, we have investigatedhow much extra data is needed to exactly recover the original sig-nal. This scenario is appropriate in a digital storage / transmissioncontext. In the “lossy” scenario, we are given stringent bitrate con-straints for the ancillary data, and try to get as close as possible tothe original sound. This may be relevant for instance when tryingto invert compression in FM radio using ancillary data transmittedin the RDS channel.The rest of this paper is constructed as follows. In section 2we brieﬂy recall the principle of dynamics compression. Section 3introduces the way to estimate the reverse-processing parameters.Section 4 gives the details of numerical experiments. Results re-garding the sound quality at constrained bitrate are presented sec-DAFX-1

Laurent Daudet | Benoit Lachaise | L. Daudet | Benoit Lachaise

[1] A. Said. Introduction to Arithmetic Coding - Theory and Practice , 2023, ArXiv.

[2] Allen Gersho,et al. Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[3] Amir Said,et al. Chapter 5 – Arithmetic Coding , 2003 .

[4] Birger Kollmeier,et al. PEMO-Q—A New Method for Objective Audio Quality Assessment Using a Model of Auditory Perception , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5] Udo Zoelzer,et al. DAFX: Digital Audio Effects , 2011 .

[6] Joerg Bitzer,et al. Parameter Estimation of Dynamic Range Compressors: Models, Procedures and Test Signals , 2006 .