Maximum likelihood modeling with Gaussian distributions for classification

Maximum likelihood (ML) modeling of multiclass data for classification often suffers from the following problems: (a) data insufficiency implying overtrained or unreliable models, (b) large storage requirement, (c) large computational requirement and/or (d) the ML is not discriminating between classes. Sharing parameters across classes (or constraining the parameters) clearly tends to alleviate the first three problems. We show that in some cases it can also lead to better discrimination (as evidenced by reduced misclassification error). The parameters considered are the means and variances of the Gaussians and linear transformations of the feature space (or equivalently the Gaussian means). Some constraints on the parameters are shown to lead to linear discrimination analysis (a well-known result) while others are shown to lead to optimal feature spaces (a relatively new result). Applications of some of these ideas to the speech recognition problem are also given.