A new linear classification method for an EEG-based Brain-Computer Interface

For an EEG-based Brain-Computer-Interface (BCI) is a new linear classification system proposed. The proposed method considers different variance-covariances for each class and, therefore, it has an advantage against Linear Discriminant Analysis. Introduction Linear classification methods, like Linear Discriminant Analysis (LDA), need less examples in order to obtain a reliable classifier (Pfurtscheller et al. 2000). In the past, LDA was used, because LDA is able to produce an output which is continuous in time as well as in amplitude. LDA was also successfully applied to many different EEG parameters, like bandpower values (ERD), as well as common spatial patterns (CSP) (Pfurtscheller et al. 2000) and Adaptive autoregressive (AAR) parameters (Schlögl, 2000). However, sometimes the LDA output seemed to be biased towards one class. This is surprising because LDA provides the weights for the “best” linear separation of the data. After more detailed analysis, some results (unpublished) have suggested, that the different variability for each class causes this bias in the LDA output. This report describes in detail an alternative linear classification method without a bias due to different variability. Linear discriminant analysis (LDA) At the beginning lets remind the analysis with linear discriminants. Lets assume each data element si has m features. Then, an element si is one point in an dimensional feature space. The number of examples is n, each example is assigned to one out of two classes C={0,1}; Then, S is a matrix of size nxm, and C is a vector of size n. N0 and N1 are the number of elements for class 0 and 1, respectively. The mean μc of each class c is the mean over all si with i being all elements with in class c. The total mean μ of the data is μ = (N0*μ0 + N1*μ1) / (N0+N1) (1.1) The covariance matrix C of the data is the expectation value for C = E<(s-μ)(s-μ)> (1.2) Technical report MDBC Dr. A. Schlögl 2/3 07.Mai.2004 Then, the weight vector w and the offset w0 are w = C . (μ1 μ0) (2.1) w0= μ . w (2.2) The weight vector w determines a separating hyperplane in the m-dimensional feature space. The normal distance D(x) of any element x is D(x) = x . w + w0 = (3.1) = (x μ) . w = (3.2) = (x μ) . C . (μ1 μ0) (3.3) If D(x) is larger than 0, x is assigned to class 1, if D(x) is smaller than 0 x is assigned to class 0. D(x)=0 defines all elements x that are part of the separating hyperplane. Mahalanobis distance based classifier (MDBC) Lets assume the data elements s have dimension m. We can say, an element si is one point in an dimensional feature space. The number of examples is n, each example is assigned to one of two classes C={0,1}; Then, S is a matrix of size nxm, and C is a vector of size n. N0 and N1 are the number of elements for class 0 and 1, respectively. The mean μc of each class c is the mean over all si with i being all elements with in class c. The covariance matrix Cc for each class c is the expectation value for Cc = E<(s-μc).(s-μc)> (4.1) It can be calculated as the mean over all elements si within the class c. Now, the mean μc and the covariance Cc determine the multivariate normal probability density function (pdf) that corresponds to class c. Now, any point in the ndimensional features space can be associated with a certain distance to each class c. Because, we have assume a multivariate normal distribution N(μc,Cc), the Mahalanobis distance (Mahalanobis, 1936) is an appropriate distance function. The mahalanobis distance dc of some point x in the feature space to the multivariate normal distribution N(μc,Cc) is defined by the following equation dc2(x) = (x-μc) . Cc -1 . (x-μc) T (5.1) Furthermore, the differences of the distances D(x) = d1(x) – d0(x) can be calculated; D(x) = d1(x) – d0(x) = (6.1) = ((x-μ1).C1 .(x-μ1) ) ((x-μ0).C0 .(x-μ0) ) (6.2) If D(x) is larger than 0, x is closer to the pdf of class 0, if D(x) is smaller than zero, x is closer to class 1. Lets say D(x) is a discrimination function based on the Mahalanobis distance, or MD-based discriminant. A MD-based classification (MDBC) is obtained if D(x) is applied to the threshold 0. Technical report MDBC Dr. A. Schlögl 3/3 07.Mai.2004 Discussion Both, LDA and MDBC, are based on the mean and the covariance of the data only. In other words, only second order statistics is used, both are linear methods, higher order statistics and non-linearity are not considered. The main difference between LDA and MDBC is, that in case of LDA the Variancecovariance of the whole data is used, whereby in case of MDBC the different covariance-matrices are considered for each class. Therefore, MDBC takes into account the different variance-covariances. The separating hyperplane can become a hypercurve, with D(x)=0, in the n-dimensional feature space. In both methods, LDA and MDBC, the largest computational effort is due to the matrix inversions. The computational effort is of order O(m3). However, the matrix inversion can be performed once during the offline analysis. Then, the inverted matrix can be used for the online classification. The order of the computational effort for the online classifier increases from a vector multiplication O(m) to a matrix multiplication O(m2) for LDA to MDBC, respectively. As long m is limited to some hundreds of features, the computational effort is hardly an argument against MDBC. Conclusion If the covariances are similar for both classes, the LDA and the MDBC will yield the same results. If the covariance matrices are different in each classes, MDBC takes this into account. Hence, the bias due to different variances in the data is removed with MDBC. Therefore, the MDBC classifier is preferable to LDA. References P. C. Mahalanobis, Proc. Natl. Institute of Science of India, 2, 49, (1936) Pfurtscheller G, Neuper C, Guger C, Harkam W, Ramoser H, Schlögl A, Obermaier B, Pregenzer M. Current trends in Graz Brain-Computer Interface (BCI) research. IEEE Trans Rehabil Eng. 2000 Jun;8(2):216-9. A. Schlögl (2000), The electroencephalogram and the adaptive autoregressive model: theory and applications. Shaker Verlag, Aachen, Germany.

[1]  Alois Schlögl,et al.  The Electroencephalogram and the Adaptive Autoregressive Model: Theory and Applications , 2000 .

[2]  G Pfurtscheller,et al.  Current trends in Graz Brain-Computer Interface (BCI) research. , 2000, IEEE transactions on rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society.