Classification based on distance in multivariate Gaussian cases
暂无分享,去创建一个
The author previously treated the problem of classification in discrete cases, employing the notion of distance [1]. The purpose of this paper is to treat that problem for multivariate Gaussian cases from the same point of view. Now, the classification problem is formulated as follows. Let {co1} be a class of sets of distributions, and let X be a random variable under consideration. Then the problem is to decide which co, is considered to contain the distribution of X. We, of course, assume here that w, and w,, have no common distributions when v P ,u. Further, for efficient decision making we assume that for a suitable distance d(., *) in the space of distributions concerned, we have d(cov, W,r) > a (> 0), (' X ,). In some cases, when d(,w, w,) = 0, we can represent each of those co, by a single distribution F, so that d(F1, F,) > 0. For such F1, we can consider the averaged distribution of w, by an adequate distribution over co,. When the distributions concerned are all known, the decision rule for the above problem runs as follows. Let S. be an 'empirical' distribution based on n observations on X. We compare the magnitudes of d(S, c,,), and take the set which minimizes d(Sn, Xv,,) as the set which contains the distribution. Then the problem is to evaluate the success rate or error rate of this procedure. In this paper, however, we shall treat the case where the distributions concerned are unknown. When the distributions concerned are unknown, we have to estimate them from observations. For that, the number of distributions concerned is required to be finite. Therefore, we assume that each w, consists of a single distribution F, and the number of F, is finite. In the present paper, we do not explicitly take into account a priori probabilities and costs of misclassification. However, our procedure will also apply with a slight modification to the case where they need to be considered.