Virtual Microphones for Multichannel Audio Resynthesis

Multichannel audio offers significant advantages for music reproduction, including the ability to provide better localization and envelopment, as well as reduced imaging distortion. On the other hand, multichannel audio is a demanding media type in terms of transmission requirements. Often, bandwidth limitations prohibit transmission of multiple audio channels. In such cases, an alternative is to transmit only one or two reference channels and recreate the rest of the channels at the receiving end. Here, we propose a system capable of synthesizing the required signals from a smaller set of signals recorded in a particular venue. These synthesized "virtual" microphone signals can be used to produce multichannel recordings that accurately capture the acoustics of that venue. Applications of the proposed system include transmission of multichannel audio over the current Internet infrastructure and, as an extension of the methods proposed here, remastering existing monophonic and stereophonic recordings for multichannel rendering.

[1]  J. W. Tukey,et al.  The Measurement of Power Spectra from the Point of View of Communications Engineering , 1958 .

[2]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[3]  P. Welch The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms , 1967 .

[4]  F. Itakura,et al.  A statistical method for estimation of speech spectral density and formant frequencies , 1970 .

[5]  T. M. Cannon,et al.  Blind deconvolution through digital signal processing , 1975, Proceedings of the IEEE.

[6]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[7]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[8]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[9]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[10]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[11]  M. Bellanger Adaptive filter theory: by Simon Haykin, McMaster University, Hamilton, Ontario L8S 4LB, Canada, in: Prentice-Hall Information and System Sciences Series, published by Prentice-Hall, Englewood Cliffs, NJ 07632, U.S.A., 1986, xvii+590 pp., ISBN 0-13-004052-5 025 , 1987 .

[12]  Satoshi Nakamura,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[13]  Jean Laroche A new analysis/synthesis system of musical signals using Prony's method-application to heavily damped percussive sounds , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[14]  William J. Williams,et al.  Improved time-frequency representation of multicomponent signals using exponential kernels , 1989, IEEE Trans. Acoust. Speech Signal Process..

[15]  Julius O. Smith,et al.  Spectral modeling synthesis: A sound analysis/synthesis based on a deterministic plus stochastic decomposition , 1990 .

[16]  Xavier Serra,et al.  A sound analysis/synthesis system based on a deterministic plus stochastic decomposition , 1990 .

[17]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[18]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[19]  Jean Laroche,et al.  Multichannel excitation/filter modeling of percussive sounds with application to the piano , 1994, IEEE Trans. Speech Audio Process..

[20]  Mamadou Mboup,et al.  LMS coupled adaptive prediction and system identification: a statistical model and transient mean analysis , 1994, IEEE Trans. Signal Process..

[21]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[22]  Ehud Weinstein,et al.  System identification using nonstationary signals , 1996, IEEE Trans. Signal Process..

[23]  Truong Q. Nguyen,et al.  Wavelets and filter banks , 1996 .

[24]  Mark Kahrs,et al.  Analysis and resynthesis of musical instrument sounds using energy separation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[25]  Yannis Stylianou,et al.  On the transformation of the speech spectrum for voice conversion , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[26]  O. Cappé,et al.  Regularization techniques for discrete cepstrum estimation , 1996, IEEE Signal Processing Letters.

[27]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[28]  Alan McCree,et al.  Efficient analysis/synthesis of percussion musical instrument sounds using an all-pole model , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[29]  Julius O. Smith,et al.  Multiresolution sinusoidal modeling for wideband audio with modifications , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[30]  Rama Chellappa,et al.  Experimental evaluation of two criteria for pattern comparison and alignment , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[31]  Alexander Kain,et al.  Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[32]  C. Kyriakakis,et al.  High-quality multichannel audio over the Internet , 1999, Conference Record of the Thirty-Third Asilomar Conference on Signals, Systems, and Computers (Cat. No.CH37020).

[33]  Mark A. Sletten,et al.  Projection pursuit classification methods applied to multiband polarimetric SAR imagery , 2000, IGARSS 2000. IEEE 2000 International Geoscience and Remote Sensing Symposium. Taking the Pulse of the Planet: The Role of Remote Sensing in Managing the Environment. Proceedings (Cat. No.00CH37120).

[34]  Pedro J. Moreno,et al.  Using the Fisher kernel method for Web audio classification , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[35]  Rodney A. Kennedy,et al.  Nonminimum-phase equalization and its subjective importance in room acoustics , 2000, IEEE Trans. Speech Audio Process..

[36]  Malcolm Slaney,et al.  Semantic-audio retrieval , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[37]  Daniel P. W. Ellis,et al.  Anchor space for classification and similarity measurement of music , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).