Non-Gaussian, Non-stationary and Nonlinear Signal Processing Methods - with Applications to Speech Processing and Channel Estimation

The Gaussian statistic model, despite its mathematical ele gance, is found to be too factitious for many real world signals, as manifested by its uns atisfactory performance when applied to non-Gaussian signals. Traditional non-Gau ssi n signal processing techniques, on the other hand, are usually associated with h igh complexities and low data efficiencies. This thesis addresses the problem of opti mum estimation of nonGaussian signals in computation-efficient and data-efficie nt ways. The approaches that we have taken exploit the high temporal-resolution non-sta tion rity or the underlying dynamics of the signals. The sub-topics being treated inclu de: joint MMSE estimation of the signal DTFT magnitude and phase, high temporal-resol ution Kalman filtering, blind de-convolution and blind system identification, and o ptimum non-linear estimation. Applications of the proposed algorithms to speech enh ancement, non-Gaussian spectral analysis, noise-robust spectrum estimation, and bli channel equalization are demonstrated. The thesis consists of two parts, the Introduction and the Pa pers. The Introduction gives background information of the problems at hand, s tate the motivation of approaches taken, summarizes the state-of-the-art in lite ratur , and describes our contributions briefly. The Papers presents our contributions i n the form of published papers. The first part of the Papers (paper A and B) deals with the impor tance of phase in non-Gaussian signal estimation. Joint MMSE estimators of b oth magnitude spectra and phase spectra are developed. Application to the enhancemen t of noisy speech signals results in clearer sounds and higher SNR than frequency doma in MMSE estimators. Here the non-Gaussianity of the speech signal is modeled by t he linearity in the phase spectrum, and is enhanced by the joint estimator. This is in c ontrast to the spectral domain MMSE estimator (e.g., the Wiener filter), which is zer o-phase. The second part of the Papers (paper C and D) attacks the non-G aussian estimation problem with a purely temporal domain approach. It is recogn ized that a temporaldomain high-resolution non-stationary LMMSE estimator is able to extract structures in both magnitude and phase spectra at a lower complexity. Fo r speech signals, the non-Gaussianity is represented by an excitation sequence w ith a rapidly varying vari-

[1]  Henry Leung,et al.  Blind identification of an autoregressive system using a nonlinear dynamical approach , 2000, IEEE Trans. Signal Process..

[2]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[3]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[4]  Kuldip K. Paliwal,et al.  A speech enhancement method based on Kalman filtering , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  W. B. Kleijn,et al.  Regularized linear prediction all-pole models , 2000, 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421).

[6]  Peter Vary,et al.  Noise suppression by spectral magnitude estimation —mechanism and theoretical limits— , 1985 .

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Kuldip K. Paliwal,et al.  Recognition of noisy speech using cumulant-based linear prediction analysis , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[9]  W. Bastiaan Kleijn,et al.  Audibility of pitch-synchronously modulated noise , 1997, 1997 IEEE Workshop on Speech Coding for Telecommunications Proceedings. Back to Basics: Attacking Fundamental Problems in Speech Coding.

[10]  B. Hofmann-Wellenhof,et al.  Introduction to spectral analysis , 1986 .

[11]  Nathalie Virag,et al.  Single channel speech enhancement based on masking properties of the human auditory system , 1999, IEEE Trans. Speech Audio Process..

[12]  Simon Haykin,et al.  Advances in spectrum analysis and array processing , 1991 .

[13]  M. R. Schroeder,et al.  Adaptive predictive coding of speech signals , 1970, Bell Syst. Tech. J..

[14]  M. Morf,et al.  Fast time-invariant implementations of Gaussian signal detectors , 1978, IEEE Trans. Inf. Theory.

[15]  D. Sengupta,et al.  Statistically/Computationally efficient estimation of non-Gaussian autoregressive processes , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Biing-Hwang Juang,et al.  Mixture autoregressive hidden Markov models for speech signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[17]  Enrique Masgrau,et al.  Robust coefficients of a higher order AR modelling in a speech enhancement system using parameterized Wiener filtering , 1994, Proceedings of MELECON '94. Mediterranean Electrotechnical Conference.

[18]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[19]  G. Giannakis On the identifiability of non-Gaussian ARMA models using cumulants , 1990 .

[20]  Tet Hin Yeap,et al.  Speech enhancement using a switching Kalman filter with a perceptual post-filter , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[21]  M. Grigoriu Applied Non-Gaussian Processes , 1995 .

[22]  Yoshihisa Ishida,et al.  Neural networks learning with L1 criteria and its efficiency in linear prediction of speech signals , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[23]  W. Bastiaan Kleijn,et al.  Spectral Envelope Estimation and Regularization , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[24]  John R. Barry,et al.  Performance of pulse-position modulation on measured non-directed indoor infrared channels , 1996, IEEE Trans. Commun..

[25]  D. A. Hsu,et al.  Long-tailed Distributions for Position Errors in Navigation , 1979 .

[26]  Søren Vang Andersen,et al.  Efficient Blind System Identification of Non-Gaussian Autoregressive Models With HMM Modeling of the Excitation , 2007, IEEE Transactions on Signal Processing.

[27]  Yue Zhang,et al.  Volterra adaptive prediction of multipath fading channel , 1997 .

[28]  Jean Rouat,et al.  Microphone array post-filter for separation of simultaneous non-stationary sources , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  Søren Vang Andersen,et al.  A Block-Based Linear MMSE Noise Reduction with a High Temporal Resolution Modeling of the Speech Excitation , 2005, EURASIP J. Adv. Signal Process..

[30]  John Mourjopoulos,et al.  Speech enhancement based on audible noise suppression , 1997, IEEE Trans. Speech Audio Process..

[31]  W. Root,et al.  An introduction to the theory of random signals and noise , 1958 .

[32]  Lennart Ljung General structure of adaptive algorithms: adaptation and tracking , 1993 .

[33]  A.H. Haddad,et al.  Applied optimal estimation , 1976, Proceedings of the IEEE.

[34]  Rob J Hyndman,et al.  Theory & Methods: Non‐Gaussian Conditional Linear AR(1) Models , 2000 .

[35]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[36]  Søren Vang Andersen,et al.  Blind Identification of Non-Gaussian Autoregressive Models for Efficient Analysis of Speech Signals , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[37]  Byung-Gook Lee,et al.  An EM-based approach for parameter enhancement with an application to speech signals , 1995, Signal Process..

[38]  Yuanqing Li,et al.  Analysis of Sparse Representation and Blind Source Separation , 2004, Neural Computation.

[39]  James Durbin,et al.  The fitting of time series models , 1960 .

[40]  Alan V. Oppenheim,et al.  All-pole modeling of degraded speech , 1978 .

[41]  Etienne Denoel,et al.  Linear prediction of speech with a least absolute error criterion , 1985, IEEE Trans. Acoust. Speech Signal Process..

[42]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[43]  H. Sorenson,et al.  Recursive bayesian estimation using gaussian sums , 1971 .

[44]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[45]  Kevin Murphy,et al.  Switching Kalman Filters , 1998 .

[46]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[47]  Petre Stoica,et al.  Spectral Analysis of Signals , 2009 .

[48]  Louis L. Scharf,et al.  Nonlinear maximum likelihood estimation of autoregressive time series , 1995, IEEE Trans. Signal Process..

[49]  George Carayannis,et al.  Speech enhancement from noise: A regenerative approach , 1991, Speech Commun..

[50]  G. Kitagawa The two-filter formula for smoothing and an implementation of the Gaussian-sum smoother , 1994 .

[51]  Jun S. Liu,et al.  Mixture Kalman filters , 2000 .

[52]  H. Rauch Solutions to the linear smoothing problem , 1963 .

[53]  Bishnu S. Atal,et al.  A new model of LPC excitation for producing natural-sounding speech at low bit rates , 1982, ICASSP.

[54]  Jerry D. Gibson,et al.  Distributions of the Two-Dimensional DCT Coefficients for Images , 1983, IEEE Trans. Commun..

[55]  Rainer Martin,et al.  Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[56]  W. Bastiaan Kleijn,et al.  On phase perception in speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[57]  W. M. Carey,et al.  Digital spectral analysis: with applications , 1986 .

[58]  Etienne Perret,et al.  Sequential Parameter Estimation of Time-Varying Non-Gaussian Autoregressive Processes , 2002, EURASIP J. Adv. Signal Process..

[59]  Jan Skoglund,et al.  On time-frequency masking in voiced speech , 2000, IEEE Trans. Speech Audio Process..

[60]  Rainer Martin,et al.  MMSE estimation of magnitude-squared DFT coefficients with superGaussian priors , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[61]  Thomas F. Quatieri,et al.  Phase coherence in speech reconstruction for enhancement and coding applications , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[62]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[63]  M.R. Raghuveer,et al.  Bispectrum estimation: A digital signal processing framework , 1987, Proceedings of the IEEE.

[64]  John H. L. Hansen,et al.  Evaluation of an auditory masked threshold noise suppression algorithm in normal-hearing and hearing-impaired listeners , 2003, Speech Commun..

[65]  K. Shanmugan,et al.  Random Signals: Detection, Estimation and Data Analysis , 1988 .

[66]  A. B. Poritz,et al.  Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.

[67]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[68]  K. Lange A gradient algorithm locally equivalent to the EM algorithm , 1995 .

[69]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[70]  Dimitrie C. Popescu,et al.  Kalman filtering of colored noise for speech enhancement , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[71]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[72]  Chin-Hui Lee Robust linear prediction for speech analysis , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[73]  Christian Kohlschein An introduction to Hidden Markov Models , 2007 .

[74]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[75]  Debasis Sengupta,et al.  Efficient estimation of parameters for non-Gaussian autoregressive processes , 1989, IEEE Trans. Acoust. Speech Signal Process..

[76]  D. Luenberger,et al.  Estimation of structured covariance matrices , 1982, Proceedings of the IEEE.

[77]  Chunjian Li,et al.  Inter-frequency dependency in mmse speech enhancement , 2004, Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004..

[78]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[79]  Hagai Attias,et al.  Independent Factor Analysis , 1999, Neural Computation.

[80]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[81]  J.E. Mazo,et al.  Digital communications , 1985, Proceedings of the IEEE.

[82]  Robert M. Gray,et al.  A Fake Process Approach to Data Compression , 1978, IEEE Trans. Commun..

[83]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[84]  Jerry M. Mendel,et al.  ARMA parameter estimation using only output cumulants , 1990, IEEE Trans. Acoust. Speech Signal Process..

[85]  Przemyslaw Dymarski,et al.  Selection of excitation vectors for the CELP coders , 1994, IEEE Trans. Speech Audio Process..

[86]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[87]  Ehud Weinstein,et al.  Signal enhancement using single and multi-sensor measurements , 1990 .

[88]  Ahmet M. Kondoz,et al.  Digital Speech: Coding for Low Bit Rate Communication Systems , 1995 .

[89]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[90]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[91]  X. Zhuang,et al.  Gaussian mixture density modeling of non-Gaussian source for autoregressive process , 1995, IEEE Trans. Signal Process..

[92]  Ehud Weinstein,et al.  Iterative-batch and sequential algorithms for single microphone speech enhancement , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[93]  Arie Yeredor,et al.  The extended least squares criterion: minimization algorithms and applications , 2001, IEEE Trans. Signal Process..

[94]  D. Burshtein,et al.  Joint modeling and maximum-likelihood estimation of pitch and linear prediction coefficient parameters. , 1992, The Journal of the Acoustical Society of America.

[95]  Petar M. Djuric,et al.  Parameter estimation for non-Gaussian autoregressive processes , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[96]  George Casella,et al.  Improving the EM Algorithm , 1992 .

[97]  Li Wugao,et al.  Modeling and simulation of non-Gaussian correlated clutter , 1996, Proceedings of International Radar Conference.

[98]  M. Rosenblatt,et al.  A Fourth Order Deconvolution Technique for Nongaussian Linear Processes. , 1982 .

[99]  K. Y. Lee,et al.  On the applications of the interacting multiple model algorithm for enhancing noisy speech , 2000, IEEE Trans. Speech Audio Process..

[100]  Rangasami L. Kashyap,et al.  Recursive estimation of images using non-Gaussian autoregressive models , 1998, IEEE Trans. Image Process..

[101]  Alan V. Oppenheim,et al.  Parameter estimation for autoregressive Gaussian-mixture processes: the EMAX algorithm , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[102]  Bishnu S. Atal,et al.  Predictive Coding of Speech at Low Bit Rates , 1982, IEEE Trans. Commun..

[103]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[104]  Fred C. Schweppe,et al.  Evaluation of likelihood functions for Gaussian signals , 1965, IEEE Trans. Inf. Theory.

[105]  John H. L. Hansen,et al.  Constrained iterative speech enhancement with application to speech recognition , 1991, IEEE Trans. Signal Process..

[106]  N. Wiener The Wiener RMS (Root Mean Square) Error Criterion in Filter Design and Prediction , 1949 .

[107]  Olivier Cappé,et al.  Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor , 1994, IEEE Trans. Speech Audio Process..

[108]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[109]  Sabine Van Huffel,et al.  Total least squares problem - computational aspects and analysis , 1991, Frontiers in applied mathematics.

[110]  R. Kohn,et al.  Bayesian estimation of an autoregressive model using Markov chain Monte Carlo , 1996 .

[111]  S. Roberts,et al.  Variational Bayes for non-Gaussian autoregressive models , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[112]  Jerry M. Mendel,et al.  Tutorial on higher-order statistics (spectra) in signal processing and system theory: theoretical results and some applications , 1991, Proc. IEEE.

[113]  Enzo Mumolo,et al.  Volterra adaptive prediction of speech with application to waveform coding , 1995, Eur. Trans. Telecommun..

[114]  L. Breuer Introduction to Stochastic Processes , 2022, Statistical Methods for Climate Scientists.

[115]  A. Walden,et al.  Spectral analysis for physical applications : multitaper and conventional univariate techniques , 1996 .

[116]  S. Godsill,et al.  Simple alternatives to the Ephraim and Malah suppression rule for speech enhancement , 2001, Proceedings of the 11th IEEE Signal Processing Workshop on Statistical Signal Processing (Cat. No.01TH8563).

[117]  Chunjian Li,et al.  Integrating Kalman filtering and multi-pulse coding for speech enhancement with a non-stationary model of the speech signal , 2004, Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004..

[118]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[119]  Ehud Weinstein,et al.  Iterative and sequential Kalman filter-based speech enhancement algorithms , 1998, IEEE Trans. Speech Audio Process..

[120]  Robert M. Gray,et al.  Toeplitz and Circulant Matrices: A Review , 2005, Found. Trends Commun. Inf. Theory.

[121]  Ehud Weinstein,et al.  Maximum likelihood noise cancellation using the EM algorithm , 1989, IEEE Trans. Acoust. Speech Signal Process..

[122]  Simon J. Godsill,et al.  Robust noise reduction for speech and audio signals , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[123]  J. Cadzow Maximum Entropy Spectral Analysis , 2006 .

[124]  C. J. Wellekens,et al.  Explicit time correlation in hidden Markov models for speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[125]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[126]  K. Siwiak,et al.  Ultra-wide band radio: the emergence of an important new technology , 2001, IEEE VTS 53rd Vehicular Technology Conference, Spring 2001. Proceedings (Cat. No.01CH37202).

[127]  Geoffrey E. Hinton,et al.  Variational Learning for Switching State-Space Models , 2000, Neural Computation.

[128]  Kah-Chye Tan,et al.  Kalman-filtering speech enhancement method based on a voiced-unvoiced speech model , 1999, IEEE Trans. Speech Audio Process..