Outliers in statistical pattern recognition and an application to automatic chromosome classification

Abstract We propose a heuristic method of parameter estimation in mixture models for data with outliers and design a Bayesian classifier for assignment of m objects to n ⩾ m classes under constraints. This method of outlier handling combined with the classifier is applied to the well-known problem of automatic, constrained classification of chromosomes into their biological classes. We show that it decreases the error rate relative to the classical, normal, model by more than 50%. When applied to the Edinburgh feature data of the large Copenhagen image data set Cpr our best classifier yields an error rate close to 1.3% relative to chromosomes; 4 out of 5 cells are correctly classified.

[1]  Jim Graham,et al.  The transportation algorithm as an aid to chromosome classification , 1983, Pattern Recognit. Lett..

[2]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[3]  J. Friedman Regularized Discriminant Analysis , 1989 .

[4]  Jim Piper The effect of zero feature correlation assumption on maximum likelihood based classification of chromosomes , 1987 .

[5]  M. R. Rao,et al.  Combinatorial Optimization , 1992, NATO ASI Series.

[6]  寛一 中川原 Automation of Chromosome Analysis , 1998 .

[7]  W. J. Dixon,et al.  Processing Data for Outliers , 1953 .

[8]  Jim Piper,et al.  Variability and bias in experimentally measured classifier error rates , 1992, Pattern Recognit. Lett..

[9]  Jim Piper,et al.  Stein's paradox and improved quadratic discrimination of real and simulated data by covariance weighting , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[10]  Erik Granum Application of Statistical and Syntactical Methods of Analysis and Classification to Chromosome Data , 1982 .

[11]  Michel Balinski,et al.  Signature Methods for the Assignment Problem , 1985, Oper. Res..

[12]  William J. Cook,et al.  Combinatorial optimization , 1997 .

[13]  J. Piper,et al.  On fully automatic feature measurement for banded chromosome classification. , 1989, Cytometry.

[14]  Gunter Ritter,et al.  Automatic context-sensitive karyotyping of human chromosomes based on elliptically symmetric statistical distributions , 1995, Pattern Recognit..

[15]  Jim Graham,et al.  An efficient transportation algorithm for automatic chromosome karyotyping , 1991, Pattern Recognit. Lett..

[16]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[17]  H. Bock Probabilistic models in cluster analysis , 1996 .

[18]  S. Kotz,et al.  Symmetric Multivariate and Related Distributions , 1989 .

[19]  A. Madansky Identification of Outliers , 1988 .

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[21]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[22]  Jim Freeman,et al.  Outliers in Statistical Data (3rd edition) , 1995 .

[23]  Jim Piper,et al.  Improved chromosome classification using monotonic functions of mahalanobis distance and the transportation method , 1994, Math. Methods Oper. Res..

[24]  R. Ledley,et al.  Chromosome Analysis by Computer , 1966 .

[25]  Thomas S. Ferguson,et al.  On the Rejection of Outliers , 1961 .

[26]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[27]  W. R. Buckland,et al.  Contributions to Probability and Statistics , 1960 .

[28]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .