Some solution to the missing feature problem in data classification, with application to noise robust ASR

We address the theoretical and practical issues involved in automatic speech recognition (ASR) when some of the observation data for the target signal is masked by other signals. Techniques discussed range from simple missing data imputation to Bayesian optimal classification. We have developed the Bayesian approach because this allows prior knowledge to be incorporated naturally into the recognition process, thereby permitting us to go beyond the simple "integrate over missing data" or "marginals" approach reported elsewhere, which we show to be inadequate for dealing with realistic patterns of missing data. After deriving general techniques for recognition with missing data, these techniques are formulated in the context of an HMM based CSR system. This scheme is evaluated under both random and more realistic patterns of missing data, with speech from the DARPA RM corpus and noise from NOISEX. We find that a key problem in real world recognition with missing data is that efficient ASR requires data vector components to be independent, and incomplete data cannot be orthogonalised in the usual way by projection. We show that use of spectral peaks only can provide an effective solution to this problem.