Principal discriminants analysis for small-sample-size problems: application to chemical sensing

Two dimensionality reduction techniques are widely used to analyze data from chemical sensor arrays: Fisher's linear discriminants analysis (LDA) and principal components analysis (PCA). LDA finds the directions of maximum discrimination in classification problems, but has a tendency to overfit when the ratio of training samples to dimensionality is low, as is commonly the case in chemical sensor array problems. PCA is more robust to overfitting but, being a variance model, fails to capture discriminatory information in low-variance sensors. In this article we propose a hybrid model, termed principal discriminants analysis (PDA), which incorporates both LDA and PCA criteria by means of a regularization parameter. The model is characterized on a synthetic dataset and validated with experimental data from an array of 15 metal-oxide sensors exposed to five varieties of roasted coffee beans. Our results show that PDA provides higher predictive accuracy than LDA or PCA alone. In addition, the model is able to find a trade-off between discriminant- and variance-based projections according to where information is located in the distribution of the data.