Discriminant Analysis , A Powerful Classification Technique in Data Mining

Data mining is a collection of analytical techniques to uncover new trends and patterns in massive databases. These data mining techniques stress visualization to thoroughly study the structure of data and to check the validity of the statistical model fit which leads to proactive decision making. Discriminant analysis is one of the data mining tools used to discriminate a single classification variable using multiple attributes. Discriminant analysis also assigns observations to one of the pre-defined groups based on the knowledge of the multi-attributes. When the distribution within each group is assumed to be multivariate normal, a parametric method can be used to develop a discriminant function using a generalized squared distance measure. The classification criterion is based on either the individual within-group covariance matrices or the pooled covariance matrix that also takes into account the prior probabilities of the classes. Non-parametric discriminant methods are based on non-parametric group-specific probability densities. Either a kernel or the k-nearest-neighbor method can be used to generate a non-parametric density estimate in each group and to produce a classification criterion. The performance of a discriminant criterion could be evaluated by estimating probabilities of mis-classification of future observations. A user-friendly SAS application utilizing the latest capabilities of SAS macro to perform discriminant analysis is presented here. Car93 data containing multi-attributes is used to demonstrate the features of discriminant analysis in discriminating the three price groups, “LOW”, “MOD”, and “HIGH” groups.