论文信息 - Audio representations for data compression and compressed domain processing

Audio representations for data compression and compressed domain processing

I certify that I have read this dissertation and that in my opinion it is fully adequate, in scope and in quality, as a dissertation for the degree of Doctor of Philosophy. I certify that I have read this dissertation and that in my opinion it is fully adequate, in scope and in quality, as a dissertation for the degree of Doctor of Philosophy. I certify that I have read this dissertation and that in my opinion it is fully adequate, in scope and in quality, as a dissertation for the degree of Doctor of Philosophy. In the world of digital audio processing, one usually has the choice of performing modiications on the raw audio signal or performing data compression on the audio signal. But, performing modiications on a data compressed audio signal has proved diicult in the past. This thesis provides new representations of audio signals that allow for both very low bit rate audio data compression and high quality compressed domain processing and modiications. In this system, two compressed domain processing algorithms are available: timescale and pitch-scale modiications. Timescale modiications alter the playback speed of audio without changing the pitch. Similarly, pitch-scale modiications alter the pitch of the audio without changing the playback speed. The algorithms presented in this thesis segment the input audio signal into separate sinusoidal, transients, and noise signals. During attack-transient regions of the audio signal, the audio is modeled by transform coding techniques. During the remaining non-transient regions, the audio is modeled by a mixture of multiresolution sinusoidal modeling and noise modeling. Careful phase matching techniques at the time boundaries between the sines and transients allow for seamless transitions between the two representations. By separating the audio into three individual representations, each can be eeciently and perceptually quantized. In addition, by segmenting the audio into transient and non-transient regions, high quality timescale modiications that stretch only the non-transient portions are possible. v vi Acknowledgements First I would like to thank my principal advisor, Prof. Julius O. Smith III. In addition to being a seemingly all-knowing audio guy, our weekly meetings during my last year in school helped me out immensely by keeping me and my research focused and on track. If it were not for the academic freedom he gives me and the other CCRMA grad students, I would not have stumbled across this thesis topic. My next thanks goes out to Tony Verma, …

Julius O. Smith | Scott Levine | J. Smith | S. Levine

[1] Phil Clendeninn. The Vocoder , 1940, Nature.

[2] Michael Goodwin,et al. Overlap-Add Synthesis of NonStationary Sinusoids , 1995, ICMC.

[3] Mohamed A. Deriche,et al. New results in low bitrate audio coding using a combined harmonic-wavelet representation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4] John M. Chowning,et al. The Synthesis of Complex Audio Spectra by Means of Frequency Modulation , 1973 .

[5] Bernd Edler. Current Status of the MPEG-4 Audio Verification Model Development , 1996 .

[6] Marina Bosi,et al. Overview of MPEG audio : Current and future standards for low-bit-rate audio coding , 1997 .

[7] James L. Flanagan,et al. Adaptive quantization in differential PCM coding of speech , 1973 .

[8] W. Voessing,et al. High Quality Digital Audio Encoding with 3.0 Bits/Sample Using Adaptive Transform Coding , 1986 .

[9] Xiao Lin,et al. Subband coding with modified multipulse LPC for high quality audio , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10] Deepen Sinha,et al. Audio compression at low bit rates using a signal adaptive switched filterbank , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[11] Robert Bristow-Johnson. Wavetable Synthesis 101, A Fundamental Perspective , 1996 .

[12] Didier Le Gall,et al. MPEG: a video compression standard for multimedia applications , 1991, CACM.

[13] Bishnu S. Atal,et al. A new model of LPC excitation for producing natural-sounding speech at low bit rates , 1982, ICASSP.

[14] Jürgen Herre,et al. MPEG-2 NBC Audio-Stereo and Multichannel Coding Methods , 1996 .

[15] David V. Anderson. Speech analysis and coding using a multi-resolution sinusoidal transform , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[16] Xavier Rodet,et al. Efficient Fourier Synthesis of Nonstationary Sinusoids , 1994, ICMC.

[17] A. Hoogendoorn,et al. Digital compact cassette , 1994, Proc. IEEE.

[18] Edward H. Adelson,et al. The Laplacian Pyramid as a Compact Image Code , 1983, IEEE Trans. Commun..

[19] G. M. Phillips. Algorithms for Piecewise Straight Line Approximations , 1968, Comput. J..

[20] Julius O. Smith,et al. A Sines+Transients+Noise Audio Representation for Data Compression and Time/Pitch Scale Modifications , 1998 .

[21] James A. Moorer,et al. About This Reverberation Business , 1978 .

[22] Ting Chen,et al. Time-scale modification of audio signals with combined harmonic and wavelet representations , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23] B. Moore,et al. A revision of Zwicker's loudness model , 1996 .

[24] Malcolm Slaney,et al. MACH1: nonuniform time-scale modification of speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[25] P. Mermelstein. G.722: a new CCITT coding standard for digital transmission of wideband audio signals , 1988, IEEE Communications Magazine.

[26] Yeon-Bae Kim,et al. Multi-Layer Bit-Sliced Bit-Rate Scalable Audio Coding , 1997 .

[27] Werner Verhelst,et al. An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28] Julius O. Smith,et al. Multiresolution sinusoidal modeling for wideband audio with modifications , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[29] Thippur V. Sreenivas,et al. Vector quantization of scale factors in advanced audio coder (AAC) , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[30] Seymour Shlien,et al. The modulated lapped transform, its time-varying forms, and its applications to audio coding standards , 1997, IEEE Trans. Speech Audio Process..

[31] Abeer Alwan,et al. Spectral analysis of subband filtered signals , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[32] Andreas Spanias,et al. Speech coding: a tutorial review , 1994, Proc. IEEE.

[33] Teresa H. Y. Meng,et al. An analysis/synthesis tool for transient signals that allows a flexible sines+transients+noise model for audio , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[34] Aníbal J. S. Ferreira. A new frequency domain approach to time-scale expansion of audio signals , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[35] Jae S. Lim,et al. Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[36] Joseph Rothweiler,et al. Polyphase quadrature filters-A new subband coding technique , 1983, ICASSP.

[37] Eric Moulines,et al. Non-parametric techniques for pitch-scale and time-scale modification of speech , 1995, Speech Commun..

[38] Takehiro Moriya,et al. A design of transform coder for both speech and audio signals at 1 bit/sample , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[39] Kees A. Immink,et al. The Compact Disc Story , 1998 .

[40] Ernst Eberlein,et al. Evaluation of Concealment Techniques for Compressed Digital Audio , 1993 .

[41] Mark J. T. Smith,et al. Analysis-by-Synthesis/Overlap-Add Sinusoidal Modeling Applied to the Analysis and Synthesis of Musical Tones , 1992 .

[42] Ronald E. Crochiere,et al. An All Digital -Commentary Grade- Sub-Band Coder , 1979 .

[43] P. Depalle,et al. Spectral Envelopes and Inverse FFT Synthesis , 1992 .

[44] M. Goodwin,et al. Time-frequency signal models for music analysis, transformation, and synthesis , 1996, Proceedings of Third International Symposium on Time-Frequency and Time-Scale Analysis (TFTS-96).

[45] Julius O. Smith,et al. Physical Modeling Synthesis Update , 1996 .

[46] Karlheinz Brandenburg. OCF--A new coding algorithm for high quality sound signals , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[47] Juin-Hwey Chen. A candidate coder for the ITU-T's new wideband speech coding standard , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[48] Th. Sporer,et al. The Use of Multirate Filter Banks for Coding of High Quality Digital Audio , 1992 .

[49] Mark Dolson,et al. The Phase Vocoder: A Tutorial , 1986 .

[50] Andrzej Drygajlo,et al. Perceptual speech coding using time and frequency masking constraints , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[51] Ernst Eberlein,et al. Comparison of filterbanks for high quality audio coding , 1992, [Proceedings] 1992 IEEE International Symposium on Circuits and Systems.

[52] Xavier Serra,et al. Musical Sound Modeling with Sinusoids plus Noise , 1997 .

[53] Tony S. Verma. An Analysis/Synthesis Tool for Transient Signals , 1998 .

[54] Joan L. Mitchell,et al. JPEG: Still Image Data Compression Standard , 1992 .

[55] M. R. Schroeder,et al. Adaptive predictive coding of speech signals , 1970, Bell Syst. Tech. J..

[56] Markus Werner,et al. Realtime Implementation of an ISO/MPEG Layer 3 Encoder on Pentium PCs , 1996 .

[57] Takehiro Moriya,et al. High-quality audio-coding at less than 64 kbit/s by using transform-domain weighted interleave vector quantization (TwinVQ) , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[58] Kevin J. Smart,et al. High quality low complexity scalable wavelet audio coding , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[59] Allen Gersho,et al. Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[60] R. Hellman. Asymmetry of masking between noise and tone , 1972 .

[61] J. D. Johnston,et al. Sum-difference stereo transform coding , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[62] Deepen Sinha,et al. AT&T Perceptual Audio Coding (PAC) , 1996 .

[63] A. Crossman. A variable bit rate audio coder for videoconferencing , 1993, Proceedings., IEEE Workshop on Speech Coding for Telecommunications,.

[64] Edward A. Lee,et al. Adaptive Signal Models: Theory, Algorithms, and Audio Applications , 1998 .

[65] James W. Beauchamp,et al. Genetic Algorithm Optimization of Additive Synthesis Envelope Breakpoints and Group Synthesis Parameters , 1995, ICMC.

[66] S. Singhal. High quality audio coding using multipulse LPC , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[67] Brian Christopher Smith,et al. Fast software processing of motion JPEG video , 1994, MULTIMEDIA '94.

[68] Henrique S. Malvar,et al. Signal processing with lapped transforms , 1992 .

[69] Julius O. Smith,et al. Spectral modeling synthesis: A sound analysis/synthesis based on a deterministic plus stochastic decomposition , 1990 .

[70] Hugo Fastl,et al. Psychoacoustics: Facts and Models , 1990 .

[71] Daniel Schulz. Improving audio codecs by noise substitution , 1996 .

[72] Ahmed H. Tewfik,et al. Enhanced wavelet based audio coder , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[73] A. Wilgus,et al. High quality time-scale modification for speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[74] Julius O. Smith,et al. A flexible sampling-rate conversion method , 1984, ICASSP.

[75] Eric D. Scheirer. The MPEG-4 Structured Audio standard , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[76] Shih-Fu Chang,et al. Tools for compressed-domain video indexing and editing , 1996, Electronic Imaging.

[77] Julius O. Smith,et al. PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation , 1987, ICMC.

[78] J. D. Johnston. Estimation of perceptual entropy using noise masking criteria , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[79] James D. Johnston,et al. Transform coding of audio signals using perceptual noise criteria , 1988, IEEE J. Sel. Areas Commun..

[80] Eduard Stikvoort,et al. Digital Dynamic Range Compressor for Audio , 1986 .

[81] Takao Kobayashi,et al. A hardware implementation of a new narrow to medium band speech coding , 1982, ICASSP.

[82] Andreas Spanias,et al. A review of algorithms for perceptual coding of digital audio signals , 1997, Proceedings of 13th International Conference on Digital Signal Processing.

[83] Michel C. Lavoie,et al. Subjective evaluation of state-of-the-art two-channel audio codecs , 1998 .

[84] Daniel P. W. Ellis,et al. A Wavelet Based Sinusoid Model of Sound for Auditory Signal Separation , 1991, ICMC.

[85] Gerhard Stoll. ISO-MPEG-2 Audio: A Generic Standard for the Coding of Two-Channel and Multichannel Sound , 1996 .

[86] Thomas F. Quatieri,et al. Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[87] Xavier Serra,et al. A system for sound analysis/transformation/synthesis based on a deterministic plus stochastic decomposition , 1989 .

[88] Louis Dunn Fielder,et al. ISO/IEC MPEG-2 Advanced Audio Coding , 1997 .

[89] Gerhard Stoll,et al. Bitrate Reduction of High Quality Audio Signals by Modeling the Ears Masking Thresholds , 1990 .

[90] B. Edler. Aliasing reduction in sub-bands of cascaded filter banks with decimation , 1992 .

[91] Kenzo Akagiri,et al. ATRAC: Adaptive Transform Acoustic Coding for MiniDisc , 1992 .

[92] Louis Dunn Fielder,et al. AC-3: Flexible Perceptual Coding for Audio Transmission and Storage , 1994 .

[93] M. Alexander Broadhead,et al. Direct manipulation of MPEG compressed digital audio , 1995, MULTIMEDIA '95.

[94] Michael M. Goodwin. Residual modeling in music analysis-synthesis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[95] Mohamed A. Deriche,et al. High quality audio coding using multipulse LPC and wavelet decomposition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[96] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[97] Deepen Sinha,et al. Low bit rate transparent audio compression using adapted wavelets , 1993, IEEE Trans. Signal Process..

[98] James A. Moorer,et al. The Use of the Phase Vocoder in Computer Music Applications , 1976 .

[99] R. J. McAulay,et al. Computationally efficient sine-wave synthesis and its application to sinusoidal transform coding , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[100] Ahmed H. Tewfik,et al. Low bit rate high quality audio coding with combined harmonic and wavelet representations , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[101] Thomas F. Quatieri,et al. Speech transformations based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[102] P. Noll,et al. Adaptive transform coding of speech signals , 1977 .

[103] Jean Laroche,et al. Phase-vocoder: about this phasiness business , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[104] Scott N. Levine,et al. Effects Processing on Audio Subband Data , 1996, ICMC.

[105] John Princen,et al. Analysis/Synthesis filter bank design based on time domain aliasing cancellation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[106] Takehiro Moriya,et al. High-quality audio coding at less than 64 kbit/s by using TwinVQ , 1995 .

[107] Mark B. Sandler,et al. On the performance of wavelets for low bit rate coding of audio signals , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[108] Gerhard Stoll,et al. ISO-MPEG-1 Audio: A Generic Standard for Coding of High-: Quality Digital Audio , 1994 .

[109] Mohamed A. Deriche,et al. Audio coding using the wavelet packet transform and a combined scalar-vector quantization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[110] Takehiro Moriya,et al. Transform-Domain Weighted Interleave Vector Quantization (TwinVQ) , 1996 .

[111] Louis Dunn Fielder,et al. AC-2 and AC-3: Low-Complexity Transform-Based Audio Coding , 1996 .

[112] Xavier Maitre,et al. 7 kHz audio coding within 64 kbit/s , 1988, IEEE J. Sel. Areas Commun..

[113] Jelena Kovacevic,et al. Wavelets and Subband Coding , 2013, Prentice Hall Signal Processing Series.

[114] Hyung Soon Kim,et al. Variable time-scale modification of speech using transient information , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[115] Bo Shen,et al. Block-based manipulations on transform-compressed images and videos , 1998, Multimedia Systems.

[116] B. Liu,et al. Implementation of the Digital Phase Vocoder Using the Fast Fourier Transform , 2022 .

[117] Henrique S. Malvar. Extended lapped transforms: fast algorithms and applications , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.