Prediction of the quality ratings of tracheoespohageal speech using adaptive time-frequency representations

Tracheoesophageal (TE) speech is the most common means of communication for people whose larynx has been removed due to cancer. TE speech is often characterized by poor quality, and proper rehabilitation is required to improve the communicative abilities of TE speakers. Objective measurements of speech quality that can assist speech pathologists in assessing the quality of a patientpsilas voice are therefore important for proper prescription and monitoring of the rehabilitation. Conventional measures such as glottal noise, fundamental frequency, and linear prediction parameters, are not reliable as they do not capture the non-stationary characteristics of TE speech adequately. This paper proposes using an adaptive time-frequency based approach to estimate the features that effectively represent the quality of TE speech. Using a database of speech samples collected from 35 TE speakers it is shown that the features extracted using the adaptive time frequency representations perform significantly better than the conventional measures.