Eigenvalue Criterion-Based Feature Selectionin Principal Component Analysis of Speech

This article presents a specific approach for selecting a limited set of most relevant, information rich speech data from the whole amount of training data. The proposed method uses Principal Component Analysis (PCA) to optimally select a lower-dimensional data subset with similar variances. In this paper, three selection algorithms, based on eigenvalue criterion are presented. The first one operates and analyzes the data at the entire speech-recording level. The second one additionally segments each of the recordings into experimentally sized blocks, which theoretically divides a record level into several smaller information richer/poorer blocks. Finally, the third one analyzes all the speech records at the feature vector level. These three approaches represent three different criterion-based selection techniques from the coarsest to the finest data level. The main aim of the presented experiments is to show that PCA trained with the limited subset of data achieves comparable or even better results than PCA trained with the entire speech corpus. In fact, this approach can radically speed up the learning of PCA with much smaller memory and computational costs. All methods are evaluated in Slovak phoneme-based large vocabulary continuous speech recognition task.