Evaluation of Bayes , ICA , PCA and SVM Methods for Classification

In this paper, we introduce the basic concepts of some state-of-the-art classification methods, including independent component analysis (ICA), principal component analysis (PCA), Bayes method, and support vector machine (SVM) or kernel machine. We discuss their function in the classification and evaluate their performance for different applications. 1 STATISTICAL CLASSIFICATION Classification means to resolve the class of an object, e.g., a ground vehicle vs. an aircraft. Recognition means to determine whether the ground vehicle is a truck, a school bus, or a tank. Identification means to identify the type or model of the target (T72 tank or M60 tank). Statistical classification utilizes the statistical pattern recognition method for classification, recognition and identification [1]. A pattern is a characteristic of an observation, such as a speech signal or a human face image. A structural characteristic extracted from a pattern is called a feature. It can be a distinctive measurement, a transformation, or a structural component. The process of converting a pattern to features is called feature extraction. Each pattern can be viewed as a point (or a vector) in the feature space. The best features are selected using a feature selection algorithm. The selected features should best represent the classes or best represent the distinction between classes. The dimensionality of the selected feature space can also be greatly reduced compared to the full feature space. The statistical classification process based on the probability distributions of the feature vectors can be described as follows: (1) First, define the classes of patterns: ) ,... , ( 2 1 M C C C (2) Then, extract and select the best features from a pattern: ) ,.. , ( 2 1 N x x x x = (3) Then, specify or learn the conditional probability function of a feature vector x belonging to class Ci: p(x| Ci) (4) Then, chose a decision rule (Bayes rule, maximum likelihood rule, Neyman-Pearson rule, or other rules). (5) Finally, find the decision boundaries. Paper presented at the RTO SET Symposium on “Target Identification and Recognition Using RF Systems”, held in Oslo, Norway, 11-13 October 2004, and published in RTO-MP-SET-080. Evaluation of Bayes, ICA, PCA and SVM Methods for Classification 37 2 RTO-MP-SET-080 The complete statistical classification process, as shown in Figure 1, includes pre-processing of observed or sensed data (such as segmentation, noise removal, filtering, spatial or temporal localization, and normalization of patterns), feature extraction, feature selection, learning, and classification. Feature extraction is accomplished with the principal component analysis (PCA) or independent component analysis (ICA). Then, in feature selection, the methods used include branch and bound search (B & B), sequential forward selection (SFS), sequential backward selection (SBS), sequential forward floating search (SFFS) and sequential backward floating search (SBFS). Finally, learning and classification are accomplished with Bayes classifier, k-nearest neighbor (k-NN) classifier, linear discrimination classifier (LDC) and support vector machine (SVM) as indicated in Figure 1. Figure 1. Basic stages of the statistical classification process. 2 FEATURE EXTRACTION 2.1 Feature Extraction and Dimensionality Reduction Feature extraction converts data patterns to features, which are condensed representations of patterns and contain only salient information (as shown in Figure 2). The converted features should represent patterns with minimal loss of the information required for best classification. Features include non-transformed structural characteristics, transformed structural characteristics, and structures (such as lines, slopes, corners, or peaks). Non-transformed structural characteristics are obtained directly from sensor observations such as amplitudes, phases, time durations, or moments. Transformed structural characteristics are obtained from transformations such as the Fourier transform, wavelet transform, time-frequency transform, singular value decomposition, or Karhunan-Loeve transform. Linear transforms, such as PCA and linear discrimination analysis (LDA), are widely used for feature extraction and dimensionality reduction. PCA is the best-known unsupervised linear feature extraction algorithm; it is a linear mapping which uses the eigenvectors with the largest eigenvalues. LDA is a supervised linear mapping based on eigenvectors, and it usually performs better than PCA for classification. ICA [2-4] is also a linear mapping but with iterative capability, which is suitable for non-Gaussian distributions. ICA decomposes a set of features into a basis whose components are statistically independent. It searches for a linear transformation WICA (or weight matrix) to express a set of feature vectors X = (x1, x2, ... xN) as a linear combination of statistically independent vectors Y = (y1, y2, ... yN), so that the transformed components X W Y T ICA = are independent, that is, knowledge of the value of yi provides no information on Statistical Classification PreProcessing Feature Extraction Feature Selection Learning & Classification Data Segmentation Noise removal Normalization PCA ICA LDA 1-dimension 2-dimension B & B SFS SBS SFFS & SBFS SVM Bayes LDC k-NN Evaluation of Bayes, ICA, PCA and SVM Methods for Classification RTO-MP-SET-080 37 3 the value of yj for i ≠ j. There is no closed form solution for finding the weight matrix WICA . Therefore, iterative algorithms have been proposed to search for a weight matrix. PCA only requires that the coefficients yi and yj be uncorrelated, i.e. 0 } { } { } , { ) , cov( = − = j i j i j i y E y E y y E y y However, independence is a stronger requirement, because independent components are uncorrelated, but uncorrelated components may not be independent. Thus, the ICA accounts for higher order statistics and provides a more powerful data representation than PCA. Kernel PCA is a nonlinear feature extraction method based on eigenvectors, which maps input patterns into a new feature space through a nonlinear function, and then performs a linear PCA in the mapped space. Figure 2. Feature extraction converts data pattern space to feature space.