A method of extracting time-varying acoustic features effective for speech recognition

Feature extraction plays a substantial role in automatic speech recognition systems. In this paper, a method is proposed to extract time-varying acoustic features that are effective for speech recognition. This issue is discussed from two aspects: one is on speech power spectrum enhancement and the other is on discriminative time-varying feature extraction which employs subphonetic units, called demiphonemes, for distinguishing non-steady labels from steady ones. We confirm its potential by applying it to spoken word recognition. The results indicate that recognition scores are improved by using the proposed features, compared with those using ordinary features such as delta-mel-cepstra provided by a well-known software tool.