Linear dimensionality reduction for classification via a sequential Bayes error minimisation with an application to flow meter diagnostics

We propose a supervised linear dimensionality reduction algorithm.The algorithm reduces the dimensionality of data to K1 for the K-class problem.The linearly reduced data is well-suited for Bayesian classification.Experiments on UCI datasets are performed for the proposed and existing algorithms.The applicability of the algorithm to flow meter diagnostics is also demonstrated. Supervised linear dimensionality reduction (LDR) performed prior to classification often improves the accuracy of classification by reducing overfitting and removing multicollinearity. If a Bayes classifier is to be used, then reduction to a dimensionality of K1 is necessary and sufficient to preserve the classification information in the original feature space for the K-class problem. However, most of the existing algorithms provide no optimal dimensionality to which to reduce the data, thus classification information can be lost in the reduced space if K1 dimensions are used. In this paper, we present a novel LDR technique to reduce the dimensionality of the original data to K1, such that it is well-primed for Bayesian classification. This is done by sequentially constructing linear classifiers that minimise the Bayes error via a gradient descent procedure, under an assumption of normality. We experimentally validate the proposed algorithm on 10 UCI datasets. Our algorithm is shown to be superior in terms of the classification accuracy when compared to existing algorithms including LDR based on Fishers criterion and the Chernoff criterion. The applicability of our algorithm is then demonstrated by employing it in diagnosing the health states of 2 ultrasonic flow meters. As with the UCI datasets, the proposed algorithm is found to have superior performance to the existing algorithms, achieving classification accuracies of 99.4% and 97.5% on the two flow meters. Such high classification accuracies on the flow meters promise significant cost benefits in oil and gas operations.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  You-yen. Yang Classification into two multivariate normal distributions with different covariance matrices , 1965 .

[3]  Ja-Chen Lin,et al.  A new LDA-based face recognition system which can solve the small sample size problem , 1998, Pattern Recognit..

[4]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning , 2008 .

[5]  Abdulkadir Sengur,et al.  An expert system based on linear discriminant analysis and adaptive neuro-fuzzy inference system to diagnosis heart valve diseases , 2008 .

[6]  Ljubomir J. Buturovic Toward Bayes-Optimal Linear Dimension Reduction , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Kuldip K. Paliwal,et al.  Linear discriminant analysis for the small sample size problem: an overview , 2014, International Journal of Machine Learning and Cybernetics.

[8]  David G. Stork,et al.  Pattern Classification , 1973 .

[9]  Daoqiang Zhang,et al.  Efficient Pseudoinverse Linear Discriminant Analysis and its Nonlinear Form for Face Recognition , 2007, Int. J. Pattern Recognit. Artif. Intell..

[10]  F. Agakov,et al.  Application of high-dimensional feature selection: evaluation for genomic prediction in man , 2015, Scientific Reports.

[11]  Jonny Eriksson,et al.  Feature reduction for classification of multidimensional data , 2000, Pattern Recognit..

[12]  Kuldip K. Paliwal,et al.  Cancer classification by gradient LDA technique using microarray gene expression data , 2008, Data Knowl. Eng..

[13]  James Brusey,et al.  Linear classifier design under heteroscedasticity in Linear Discriminant Analysis , 2017, Expert Syst. Appl..

[14]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[15]  Abdulkadir Sengür,et al.  An expert system based on linear discriminant analysis and adaptive neuro-fuzzy inference system to diagnosis heart valve diseases , 2008, Expert Syst. Appl..

[16]  A. Lyon,et al.  Why are Normal Distributions Normal? , 2014, The British Journal for the Philosophy of Science.

[17]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .

[18]  Henry P. Decell,et al.  Feature combinations and the bhattacharyya criterion , 1976 .

[19]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[20]  Robert P. W. Duin,et al.  Linear dimensionality reduction via a heteroscedastic extension of LDA: the Chernoff criterion , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[22]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[23]  Robert P. W. Duin,et al.  Non-iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data , 2002, SSPR/SPR.

[24]  Aleix M. Martínez,et al.  Bayes Optimality in Linear Discriminant Analysis , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Konstantinos N. Plataniotis,et al.  Regularized discriminant analysis for the small sample size problem in face recognition , 2003, Pattern Recognit. Lett..

[26]  Kemal Polat,et al.  A cascade learning system for classification of diabetes disease: Generalized Discriminant Analysis and Least Square Support Vector Machine , 2008, Expert Syst. Appl..

[27]  H. P. Decell,et al.  Feature combinations and the divergence criterion , 1977 .

[28]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[29]  Hua Yu,et al.  A direct LDA algorithm for high-dimensional data - with application to face recognition , 2001, Pattern Recognit..

[30]  David Zhang,et al.  A parameterized direct LDA and its application to face recognition , 2007, Neurocomputing.

[31]  R. Maronna Alan Julian Izenman (2008): Modern Multivariate Statistical Techniques: Regression, Classification and Manifold Learning , 2011 .

[32]  Sidney Marks,et al.  Discriminant Functions When Covariance Matrices are Unequal , 1974 .

[33]  Ekrem Duman,et al.  Detecting credit card fraud by Modified Fisher Discriminant Analysis , 2015, Expert Syst. Appl..

[34]  Frank Nielsen,et al.  Generalized Bhattacharyya and Chernoff upper bounds on Bayes error using quasi-arithmetic means , 2014, Pattern Recognit. Lett..

[35]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[36]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[37]  D. Coomans,et al.  The application of linear discriminant analysis in the diagnosis of thyroid diseases , 1978 .

[38]  Zoubin Ghahramani,et al.  Unifying linear dimensionality reduction , 2014, 1406.0873.