Reasoning about Outliers by Modelling Noisy Data

Outliers are difficult to handle because some of them can be measurement errors, while others may represent phenomena of interest, something “significant” from the viewpoint of the application domain. Statistical methods for managing outliers do not distinguish between these two possibilities. In our previous work, we suggested a method for distinguishing these two possibilities by modelling “real measurements” — how measurements should be distributed in a domain of interest. In this paper, we make this distinction by modelling measurement errors instead. The proposed method is better suited to those applications where it is difficult to obtain relevant knowledge about real measurements. The test data collected from a recent glaucoma case finding study in a general practice are used to evaluate the method.

[1]  Ronald J. Brachman,et al.  The Process of Knowledge Discovery in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[2]  F. E. Grubbs Sample Criteria for Testing Outlying Observations , 1950 .

[3]  Xiaohui Liu,et al.  Identifying the measurement noise in glaucomatous testing: an artificial neural network approach , 1994, Artif. Intell. Medicine.

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  M. Pazzani,et al.  Concept formation knowledge and experience in unsupervised learning , 1991 .

[6]  Wu Jxw Visual screening for blinding diseases in the community using computer controlled video perimetry. , 1993 .

[7]  Douglas M. Hawkins,et al.  The Detection of Errors in Multivariate Data Using Principal Components , 1974 .

[8]  Klaus Pawelzik,et al.  Quantifying the neighborhood preservation of self-organizing feature maps , 1992, IEEE Trans. Neural Networks.

[9]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[10]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[11]  S.J.J. Smith,et al.  Empirical Methods for Artificial Intelligence , 1995 .

[12]  Michael J. Pazzani,et al.  Concept formation in context , 1991 .

[13]  Brian Everitt,et al.  Cluster analysis , 1974 .

[14]  Isabelle Guyon,et al.  Discovering Informative Patterns and Data Cleaning , 1996, Advances in Knowledge Discovery and Data Mining.

[15]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .