Non-parametric estimation and correction of non-linear distortion in speech systems

The performance of speech systems such as speaker recognition degrades drastically when there is mismatch between training and testing conditions, caused by non-linear distortion. This paper describes a technique to estimate and correct such non-linear distortion in speech. The focus is on constrained restoration of degraded speech, that is distortion in the test speech is undone relative to the training speech. Restoration is a two step process-estimation followed by inversion. The non-linearity is estimated in the form of a look-up table by a process of statistical matching using a reference speech template. This statistical matching technique provides a very good estimate of the true non-linear characteristic, and the process is robust, computationally efficient, and universally applicable. Speaker-ID experiments, using artificially corrupted test speech, showed significant improvement in performance after the test speech was 'cleaned' using this technique. The restoration process itself does not introduce appreciable distortion.