Method of selecting the training data based on non-uniform sampling for the speech recognition vector quantization
暂无分享,去创建一个
1. the art that the invention defined in the claims The present invention will on a computer-readable recording medium recording a program for realizing the selection of training data, which method is based on the non-uniform sampling in the speech feature vector quantization for speech recognition and the method. 2. The invention attempts to solve the technical challenges The invention speech for speech recognition for selecting the learning data characteristic of the speech, for example, considering the frequency of appearance each phoneme, based on the non-uniform sampling in accordance with the language it to select the training data used to train an audio feature vector quantization in the feature vector quantization to provide a computer readable recording medium recording a program for realizing the learning data selection method and the method based on a non-uniform sample that purpose. 3. Resolution of the subject matter of the invention, The invention Forced alignment method comprising: in the selected learning data method, by the sample voice data received Forced alignment (forced alignment) obtains the pronunciation information for each phoneme; Phoneme-specific occurrence list generation step of generating a phoneme-specific occurrence list, on the basis of the acquired information, pronunciation; The generating calculates the occurrence frequency ratio of a phoneme according to the phoneme-specific language on the basis of the occurrence list phoneme occurrence frequency ratio calculating step; And with reference to the calculated phoneme occurrence frequency ratio including a learning data deriving step of deriving a training data which the error between the entire frequency and phoneme by minimizing occurrence frequency ratio for each phoneme. 4. An important use of the invention, The invention yiyongdoem like speech recognition field. Speech recognition, speech feature vector, vector quantized, MFCC (Mel-Frequency Cepstral Coefficient), a non-uniform samples, the training data selected language phoneme occurrence frequency