Why 1-Bit Sigma-Delta Conversion is Unsuitable for High-Quality Applications

Single-stage, 1-bit sigma-delta converters are in principle imperfectible. We prove this fact. The reason, simply stated, is that, when properly dithered, they are in constant overload. Prevention of overload allows only partial dithering to be performed. The consequence is that distortion, limit cycles, instability, and noise modulation can never be totally avoided. We demonstrate these effects, and using coherent averaging techniques, are able to display the consequent profusion of nonlinear artefacts which are usually hidden in the noise floor. Recording, editing, storage, or conversion systems using single-stage, 1-bit sigma-delta modulators, are thus inimical to audio of the highest quality. In contrast, multi-bit sigma-delta converters, which output linear PCM code, are in principle infinitely perfectible. (Here, multi-bit refers to at least two bits in the converter.) They can be properly dithered so as to guarantee the absence of all distortion, limit cycles, and noise modulation. The audio industry is misguided if it adopts 1-bit sigma-delta conversion as the basis for any high-quality processing, archiving, or distribution format to replace multi-bit, linear PCM. 0. INTRODUCTION This paper is an enlarged and extended version of [1], and its findings regarding 1-bit sigma-delta modulators are explored in greater detail in an associated paper [2]. In the past twenty or so years we have seen the multi-bit converter technology used in professional and consumer equipment progress from 14, through 16 and 18, to 20 or more bits of resolution. Indeed, the 16-bit linear PCM format became enshrined in the CD standard, and was the basis of most digital audio storage devices for many years. All analogue-to-digital and digital-to-analogue conversions and intermediate digital signal processing steps were performed in the linear, multi-bit PCM format, using internal processing wordlengths greater than the desired final numerical precision. One primary benefit of this format is the fact that such systems can be rendered completely linear, with infinite resolution below the least significant bit (LSB), by the adoption of proper dithering at each quantizing, or (in the case of editing and signal processing) at each requantizing, stage. Such dithering, with the optimal triangular probability density function (TPDF) dither, in principle completely LIPSHITZ AND VANDERKOOY WHY 1-BIT SIGMA-DELTA CONVERSION IS UNSUITABLE AES 110 CONVENTION, AMSTERDAM, NETHERLANDS, 2001 MAY 12–15 2 eliminates all distortion, noise modulation, and other signaldependent artefacts, leaving a storage system with a constant, signalindependent, and hence benign noise floor (see [3] and [4]). This is now well understood, and such practices have been the norm in the industry for over a decade. In practice, of course, no actual analogue realization can achieve this theoretical perfection, but in the digital domain the departure from perfection can indeed be zero due to the numerical precision of the arithmetical operations involved. In recent years, we have seen the consumer audio industry perform a remarkable feat of salesmanship by proclaiming that 1-bit converters are better than multi-bit converters, and succeeding in marketing 1bit products as preferable for the highest-quality performance. The original primary motivation for pursuing the 1-bit converter architecture was not superior performance, but rather the fact that it is cheaper to manufacture, consumes less power, and can operate well at the voltages used in battery-powered portable equipment. This has now become secondary, as 1-bit converters are currently used in consumer audio equipment at all price and quality levels. The manufacturers of high-quality converters struggled mightily to produce 1-bit devices that met the performance goals of the industry. But, they could never eliminate all the undesirable artefacts of such converters, and after more than a decade of trying, they came to the realization that they could produce better performance by using multi-bit converter architectures in their products. The one inherent advantage of the 1-bit architecture, namely its avoidance of the levelmatching difficulties found in multi-bit converters, turned out not to be as significant a benefit as one might have thought. If one examines the current data-sheets of all the major high-quality converter manufacturers, one finds that they have almost universally given up on the 1-bit sigma-delta topology in favor of oversampling converters using more than two levels. Such converter architectures can avoid the intractabilities of both the 1-bit and the 20+ -bit designs. They can be properly dithered, and can thus be guaranteed to be free of low-level, limit-cycle oscillations (“birdies”). Moreover, they do not suffer from the high-level instability problems of the higher-order, 1-bit sigma-delta converters. In light of the above, it is with alarm that we note the adoption of the single-stage, 1-bit sigma-delta converter architecture as the encoding standard for a next-generation (and supposedly higher-quality) consumer digital audio format. We refer, of course, to the Direct Stream Digital (DSD) encoding which forms the basis of the Super Audio CD format introduced recently by Philips and Sony (see, for example, [5] and [6]). The original intention to have the digital audio data at every stage of the processing — from the original analogueto-digital conversion, through all the editing and mastering operations — stored in the DSD 1-bit format has apparently now been abandoned. This was a wise decision. The conversion to the final 1-bit DSD format, however, still represents a required, and quite unnecessary, degradation of the quality of the audio signal. Every single 1-bit data conversion entails an inevitable loss of signal quality in a way which need not occur with multi-bit, linear PCM. The original rationale for storing a 1-bit DSD format signal on the Super Audio CD has now entirely vanished. The analogue-to digital and digital-to-analogue conversions, and all intermediate digital signal processing, will likely be done using multi-bit converters and storage formats. There really is no point in degrading the signal, by squeezing it onto a 1-bit Super Audio CD for transmission to the consumer, only to have it converted back to multi-bit PCM in the process of being played back. We shall now explain our reasoning in detail. 1, 2 Trademarks of Philips Electronics NV and Sony Electronics Inc. 1. MULTI-BIT VERSUS 1-BIT CONVERTERS In a normal multi-bit digital audio system, the intention is that the quantizer (i.e., essentially the number system) is never deliberately driven into saturation. Because one has enough levels available, avoiding saturation is not a significant problem in practice. Moreover, there is no problem in devoting a few LSBs of headroom to ensuring that quantization errors are properly dithered. In straight linear PCM encoding, the proper (i.e., TPDF) dither spans precisely two LSBs. For example, in a straight 16-bit system, the dither occupies only two out of the 65,536 levels available. This causes a negligible reduction in system headroom in return for all the acknowledged benefits of properly-dithered signal manipulation. If one wishes to reduce the data word-length used, one can recover the lost signal-to-noise ratio by a combination of oversampling and noise shaping. Alternatively, one can increase the system’s signal-to-noise ratio by the use of oversampling and/or noise shaping, while leaving the word-length unchanged. Noise shaping allows one to increase the signal-to-noise ratio in the audio band at the expense of decreasing it at frequencies above the audio band. One can even use in-band noise shaping without oversampling to significantly increase the perceived signal-to-noise ratio (see [7] and [8]). As long as the quantizer inside the noise shaper does not saturate, and is properly dithered, one is guaranteed that this process is completely transparent, in that it is totally distortion free. Noise shaping entails negative error feedback around the quantizer. In a noise shaper, a filter H(z) is used to spectrally shape the quantization error E. Fig. 1 shows the architecture of a simple dithered noise-shaping quantizer. Figure 1. Simple dithered noise-shaping quantizer. In this diagram, X is the input signal, N is the dither, W is the total input to the quantizer Q, and Y is the output signal. The quantization error E is extracted around the dithered quantizer (which can be multi-bit or single-bit), and subtracted from the input after passing through the noise-shaping filter H(z). H(z) can be either recursive or non-recursive. This is the error feedback loop. The signal in this loop is very small as long as the quantizer does not overload. The dither N controls the statistics of the error signal E such that, with TPDF dither, E has zero mean, constant variance, and a constant white power spectral density, independent of the input signal — indeed, E is then uncorrelated with X. This means that there is no distortion or noise modulation (see [3] and [4]). In addition, the negative feedback loop is stable as long as there is no overload, and this is easily achieved with a multi-bit quantizer Q. The theory of such dithered noise shapers can be found in [7], [8], and [9] for example. In a sampled-data realization, the z-transforms of the input, X(z), output, Y(z), and error, E(z), are related by Y(z) = X(z) + {1 – H(z)}⋅E(z). LIPSHITZ AND VANDERKOOY WHY 1-BIT SIGMA-DELTA CONVERSION IS UNSUITABLE AES 110 CONVENTION, AMSTERDAM, NETHERLANDS, 2001 MAY 12–15 3 The signal thus passes through the system unchanged, and the quantization error E(z) appears at the output shaped by the effective noise-transfer function {1 – H(z)}, to become the system’s total error {1 – H(z)}⋅E(z). Proper TPDF dither N controls the statistical properties and power spectrum of the error signal E, and hence