Applying Bayesian classification to protein structure

A report is given on the advantages of Bayesian classification over traditional methods and the challenges in applying the Autoclass III program, a heuristic Bayesian classifier, in the domain of biotechnology and protein structure classification. The machine learning technique of heuristic Bayesian classification specifically addresses the question of how many classes a dataset should be divided into, as well as what the classifications should be. The method is based on a minimal message length description of the dataset. The cost (in bits) of specifying a classification is added to the cost of accounting for each exemplar in terms of its distance from the class definition and the total cost is minimized. In addition to providing a well founded estimate of the number of classes necessary to optimally characterize a dataset, this method also generates test classifications where within-class variances differ significantly.<<ETX>>