Signal compression based on models of human perception

The notion of perceptual coding, which is based on the concept of distortion masking by the signal being compressed, is developed. Progress in this field as a result of advances in classical coding theory, modeling of human perception, and digital signal processing, is described. It is proposed that fundamental limits in the science can be expressed by the semiquantitative concepts of perceptual entropy and the perceptual distortion-rate function, and current compression technology is examined in that framework. Problems and future research directions are summarized. >

[1]  Yair Shoham Constrained-stochastic excitation coding of speech at 4.8 kb/s , 1990, ICSLP.

[2]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[3]  R. Gallager Information Theory and Reliable Communication , 1968 .

[4]  Claude E. Shannon,et al.  A Mathematical Theory of Communications , 1948 .

[5]  D. Esteban,et al.  Application of quadrature mirror filters to split band voice coding schemes , 1977 .

[6]  H. G. Musmann The ISO audio coding standard , 1990, [Proceedings] GLOBECOM '90: IEEE Global Telecommunications Conference and Exhibition.

[7]  W. D. Voiers,et al.  Diagnostic Evaluation of Speech Intelligibility , 1977 .

[8]  D.J. Granrath,et al.  The role of human visual models in image processing , 1981, Proceedings of the IEEE.

[9]  Carl-Erik W. Sundberg,et al.  Subband speech coding and matched convolutional channel coding for mobile radio channels , 1991, IEEE Trans. Signal Process..

[10]  Oscar Nierstrasz,et al.  Integrated Office Systems , 1989, Object-Oriented Concepts, Databases, and Applications.

[11]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[12]  Henrique S. Malvar,et al.  Signal processing with lapped transforms , 1992 .

[13]  William K. Pratt,et al.  Scene Adaptive Coder , 1984, IEEE Trans. Commun..

[14]  N. Kitawaki,et al.  Speech coding technology for ATM networks , 1990, IEEE Communications Magazine.

[15]  M. Kunt,et al.  Second-generation image-coding techniques , 1985, Proceedings of the IEEE.

[16]  Kiyoharu Aizawa,et al.  Model-based analysis synthesis image coding (MBASIC) system for a person's face , 1989, Signal Process. Image Commun..

[17]  W. Daumer Subjective Evaluation of Several Efficient Speech Coders , 1982, IEEE Trans. Commun..

[18]  Richard V. Cox,et al.  The design of uniformly and nonuniformly spaced pseudoquadrature mirror filters , 1986, IEEE Trans. Acoust. Speech Signal Process..

[19]  Allen Gersho,et al.  Asymptotically optimal block quantization , 1979, IEEE Trans. Inf. Theory.

[20]  Schuyler Quackenbush,et al.  Hardware implementation of a color image decoder for remote database access , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[21]  P. Pirsch,et al.  Advances in picture coding , 1985, Proceedings of the IEEE.

[22]  Heidi A. Peterson,et al.  Luminance-model-based DCT quantization for color image compression , 1992, Electronic Imaging.

[23]  Atul Puri,et al.  Motion-compensated video coding with adaptive perceptual quantization , 1991, IEEE Trans. Circuits Syst. Video Technol..

[24]  John A. Saghri,et al.  Image Quality Measure Based On A Human Visual System Model , 1989 .

[25]  P. Noll,et al.  Adaptive transform coding of speech signals , 1977 .

[26]  Joan L. Mitchell,et al.  JPEG: Still Image Data Compression Standard , 1992 .

[27]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[28]  Nuggehally Sampath Jayant,et al.  Sparse codebooks for the quantization of nondominant sub-bands in image coding , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[29]  Man Mohan Sondhi,et al.  Enhancement of ADPCM speech coding with backward-adaptive algorithms for postfiltering and noise feedback , 1988, IEEE J. Sel. Areas Commun..

[30]  James D. Johnston,et al.  A filter family designed for use in quadrature mirror filter banks , 1980, ICASSP.

[31]  Arild Fuldseth,et al.  A real-time implementable 7 khz speech coder at 16 kbit/s , 1991, EUROSPEECH.

[32]  Joel Max,et al.  Quantizing for minimum distortion , 1960, IRE Trans. Inf. Theory.

[33]  Alan C. Bovik,et al.  Hierarchical visual pattern image coding , 1992, IEEE Trans. Commun..

[34]  David J. Goodman Embedded DPCM for Variable Bit Rate Transmission , 1980, IEEE Trans. Commun..

[35]  K. R. Rao,et al.  Human visual weighted progressive image transmission , 1990, IEEE Trans. Commun..

[36]  P. Schultheiss,et al.  Block Quantization of Correlated Gaussian Random Variables , 1963 .

[37]  Sarah A. Rajala,et al.  Subband/VQ coding in perceptually uniform color spaces , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38]  David L. Neuhoff,et al.  Perceptual coding of images for halftone display , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[39]  John Princen,et al.  Subband/Transform coding using filter bank designs based on time domain aliasing cancellation , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[40]  Karlheinz Brandenburg OCF--A new coding algorithm for high quality sound signals , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41]  R. M. Boynton Human color vision , 1979 .

[42]  Nobuhiko Kitawaki,et al.  Pure Delay Effects on Speech Quality in Telecommunications , 1991, IEEE J. Sel. Areas Commun..

[43]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[44]  John W. Woods,et al.  Subband coding of images , 1986, IEEE Trans. Acoust. Speech Signal Process..

[45]  A.D. Wyner,et al.  Fundamental limits in information theory , 1981, Proceedings of the IEEE.

[46]  A. Nejat Ince,et al.  Digital Speech Processing , 1992 .

[47]  D. Legall,et al.  MPEG : A video compression standard for multimedia applications , 1991 .

[48]  Allen Gersho,et al.  Real-time vector APC speech coding at 4800 bps with adaptive postfiltering , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[49]  Ming Lei Liou,et al.  Overview of the p×64 kbit/s video coding standard , 1991, CACM.

[50]  D.C. Cox,et al.  Portable digital radio communications-an approach to tetherless access , 1989, IEEE Communications Magazine.

[51]  R.A. Schaphorst,et al.  How will we rate telecommunications system performance? , 1991, IEEE Communications Magazine.

[52]  David J. Goodman,et al.  Subjective Quality of the Same Speech Transmission Conditions in Seven Different Countries , 1982, IEEE Trans. Commun..

[53]  Jr. Thomas G. Stockham,et al.  Image processing in the context of a visual model , 1972 .

[54]  Robert J. Safranek,et al.  Perceptually tuned sub-band image coder , 1990, Other Conferences.

[55]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[56]  V. J. Mathews,et al.  Vector quantization of images using visual masking functions , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[57]  David J. Sakrison,et al.  The effects of a visual fidelity criterion of the encoding of images , 1974, IEEE Trans. Inf. Theory.

[58]  R. Wilson,et al.  Anisotropic Nonstationary Image Estimation and Its Applications: Part II - Predictive Image Coding , 1983, IEEE Transactions on Communications.

[59]  Andrew B. Watson,et al.  Visually optimal DCT quantization matrices for individual images , 1993, [Proceedings] DCC `93: Data Compression Conference.

[60]  Didier Le Gall,et al.  MPEG: a video compression standard for multimedia applications , 1991, CACM.

[61]  P. Mermelstein G.722: a new CCITT coding standard for digital transmission of wideband audio signals , 1988, IEEE Communications Magazine.

[62]  Ed F. Deprettere,et al.  Regular-pulse excitation-A novel approach to effective and efficient multipulse coding of speech , 1986, IEEE Trans. Acoust. Speech Signal Process..

[63]  A B Watson,et al.  Efficiency of a model human image code. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[64]  John O. Limb,et al.  Distortion Criteria of the Human Viewer , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[65]  Bernard M. Smith Instantaneous companding of quantized signals , 1957 .

[66]  I. A. Gerson,et al.  Vector sum excited linear prediction (VSELP) speech coding at 8 kbps , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[67]  Ira Alan Gerson,et al.  Vector Sum Excited Linear Prediction (VSELP) , 1991 .

[68]  Yen-Chun Lin,et al.  A Low-Delay CELP Coder for the CCITT 16 kb/s Speech Coding Standard , 1992, IEEE J. Sel. Areas Commun..

[69]  Joyce H. D. M. Westerink,et al.  Subjective image quality as a function of viewing distance , 1987 .

[70]  P. Noll,et al.  Wideband speech and audio coding , 1993, IEEE Communications Magazine.

[71]  O. Faugeras Digital color image processing within the framework of a human visual model , 1979 .

[72]  Bernd Girod,et al.  The Information Theoretical Significance of Spatial and Temporal Masking in Video Signals , 1989, Photonics West - Lasers and Applications in Science and Engineering.

[73]  E. T. Klemmer,et al.  Subjective evaluation of delay and echo suppressors in telephone communications , 1963 .

[74]  P. Vaidyanathan Quadrature mirror filter banks, M-band extensions and perfect-reconstruction techniques , 1987, IEEE ASSP Magazine.

[75]  Y. Ninomiya,et al.  HDTV broadcasting systems , 1991, IEEE Communications Magazine.

[76]  B. Julesz,et al.  Spatial-frequency masking in vision: critical bands and spread of masking. , 1972, Journal of the Optical Society of America.

[77]  William F. Schreiber Psychophysics and the Improvement of Television Image Quality , 1984 .

[78]  Arnaud E. Jacquin,et al.  A novel fractal block-coding technique for digital images , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[79]  B. Atal,et al.  Optimizing digital speech coders by exploiting masking properties of the human ear , 1978 .

[80]  Günther Theile,et al.  Low-Bit Rate Coding of High Quality Audio Signals , 1987 .

[81]  B. Atal,et al.  Predictive coding of speech signals and subjective error criteria , 1979 .

[82]  W. Voiers,et al.  Diagnostic acceptability measure for speech communication systems , 1977 .

[83]  C. Grewin,et al.  Subjective Assessments Of Low Bit-rate Audio Codecs , 1991, Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics.

[84]  T. Kim New finite state vector quantizers for images , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[85]  D. Mclaren,et al.  Removal of subjective redundancy from DCT-coded images , 1991 .

[86]  Martin Vetterli,et al.  Wavelets and filter banks: theory and design , 1992, IEEE Trans. Signal Process..

[87]  Andrew Sekey,et al.  An Objective Measure for Predicting Subjective Quality of Speech Coders , 1992, IEEE J. Sel. Areas Commun..

[88]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[89]  Andrew B. Watson,et al.  The cortex transform: rapid computation of simulated neural images , 1987 .

[90]  D. Amnon Silverstein,et al.  Relevance of human vision to JPEG-DCT compression , 1992, Electronic Imaging.

[91]  Robert Forchheimer,et al.  Image coding-from waveforms in animation , 1989, IEEE Trans. Acoust. Speech Signal Process..

[92]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[93]  N. Ahmed,et al.  Discrete Cosine Transform , 1996 .

[94]  Peter H. Westerink,et al.  A High Quality Digital HDTV Codec , 1991 .

[95]  Scott J. Daly,et al.  Visible differences predictor: an algorithm for the assessment of image fidelity , 1992, Electronic Imaging.

[96]  J. Robson,et al.  Application of fourier analysis to the visibility of gratings , 1968, The Journal of physiology.

[97]  A.N. Netravali,et al.  Picture coding: A review , 1980, Proceedings of the IEEE.

[98]  R. Steele The cellular environment of lightweight handheld portables , 1989, IEEE Communications Magazine.

[99]  N. Kitawaki,et al.  Quality assessment of speech coding and speech synthesis systems , 1988, IEEE Communications Magazine.

[100]  M. G. Perkins,et al.  A psychophysically justified bit allocation algorithm for subband image coding systems , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[101]  Setsu Komiyama,et al.  Subjective Evaluation of Angular Displacement between Picture and Sound Directions for HDTV Sound Systems , 1989 .

[102]  King Ngi Ngan,et al.  Adaptive cosine transform coding of images in perceptual domain , 1989, IEEE Trans. Acoust. Speech Signal Process..

[103]  R. Hellman Asymmetry of masking between noise and tone , 1972 .

[104]  R. Steele,et al.  Delta Modulation Systems , 1975 .

[105]  J. D. Johnston,et al.  Sum-difference stereo transform coding , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[106]  H. Watanabe Integrated office systems: 1995 and beyond , 1987, IEEE Communications Magazine.

[107]  Nariman Farvardin,et al.  Perceptually based low bit rate video coding , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[108]  David O. Beaumont,et al.  Two-layer video coding for ATM networks , 1991, Signal Process. Image Commun..

[109]  Yair Shoham,et al.  Coding of wideband speech , 1991, Speech Commun..

[110]  Yao Wang,et al.  Signal loss recovery in DCT-based image and video codecs , 1991, Other Conferences.

[111]  Z. L. Budrikis,et al.  Visual fidelity criterion and modeling , 1972 .

[112]  E. T. Klemmer Subjective evaluation of transmission delay in telephone conversations , 1967 .

[113]  Gunnar Karlsson,et al.  Packet video and its integration into the network architecture , 1989, IEEE J. Sel. Areas Commun..

[114]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[115]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[116]  M. Vetterli Multi-dimensional sub-band coding: Some theory and algorithms , 1984 .

[117]  P. Wintz Transform picture coding , 1972 .

[118]  Limin Wang,et al.  Progressive image transmission using vector quantization on images in pyramid form , 1989, IEEE Trans. Commun..

[119]  Pamela C. Cosman,et al.  Incorporating visual factors into vector quantizers for image compression , 1993 .

[120]  James D. Johnston,et al.  Transform coding of audio signals using perceptual noise criteria , 1988, IEEE J. Sel. Areas Commun..

[121]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[122]  Mitsuru Nomura,et al.  Layered coding for ATM based video distribution systems , 1991, Signal Process. Image Commun..

[123]  C. J. Harris,et al.  Packet transmission of speech using variable‐quality coding and time‐interval modification , 1977 .

[124]  Peter Kabal,et al.  Wideband CELP speech coding at 16 kbits/sec , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[125]  I. Daubechies Orthonormal bases of compactly supported wavelets , 1988 .

[126]  G. Buchsbaum Color signal coding: Color vision and color television , 1987 .

[127]  Willem Verbiest,et al.  A variable bit rate video codec for asynchronous transfer mode networks , 1989, IEEE J. Sel. Areas Commun..

[128]  B. Atal High-quality speech at low bit rates: Multi-pulse and stochastically excited linear predictive coders , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[129]  Thomas E. Tremain,et al.  An elevation of 4800 bps voice coders , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[130]  Jerry D. Gibson,et al.  Sequentially Adaptive Backward Prediction in ADPCM Speech Coders , 1978, IEEE Trans. Commun..

[131]  R. J. Safranek,et al.  A perceptually tuned sub-band image coder with image dependent quantization and post-quantization data compression , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[132]  L. Davisson Rate-distortion theory and application , 1972 .

[133]  Ernest L. Hall,et al.  A Nonlinear Model for the Spatial Characteristics of the Human Visual System , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[134]  A J Ahumada,et al.  Putting the visual system noise back in the picture. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[135]  Lawrence G. Roberts,et al.  Picture coding using pseudo-random noise , 1962, IRE Trans. Inf. Theory.

[136]  Nikil Jayant,et al.  Signal Compression: Technology Targets and Research Directions , 1992, IEEE J. Sel. Areas Commun..

[137]  James L. Flanagan,et al.  Digital coding of speech in sub-bands , 1976, The Bell System Technical Journal.

[138]  O. Rioul,et al.  Wavelets and signal processing , 1991, IEEE Signal Processing Magazine.

[139]  J. B. O'Neal,et al.  Predictive quantizing systems (differential pulse code modulation) for the transmission of television signals , 1966 .

[140]  R. Crochiere,et al.  Speech Coding , 1979, IEEE Transactions on Communications.

[141]  C. Cutler,et al.  Delayed Encoding: Stabilizer for Adaptive Coders , 1971 .

[142]  A.V. Oppenheim,et al.  The importance of phase in signals , 1980, Proceedings of the IEEE.

[143]  Nobuhiko Kitawaki,et al.  Speech-quality assessment methods for speech-coding systems , 1984, IEEE Communications Magazine.

[144]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1991, CACM.

[145]  Norman B. Nill,et al.  A Visual Model Weighted Cosine Transform for Image Compression and Quality Assessment , 1985, IEEE Trans. Commun..