论文信息 - Visual Comparison of Datasets Using Mixture Decompositions

Visual Comparison of Datasets Using Mixture Decompositions

This article describes how a mixture of two densities, f0 and f1, may be decomposed into a different mixture consisting of three densities. These new densities, f+, f-, and f=, summarize differences between f0 and f1: f+ is high in areas of excess of f1 compared to f0; f- represents deficiency of f1 compared to f0 in the same way; f= represents commonality between f1 and f0. The supports of f+ and f- are disjoint. This decomposition of the mixture of f0 and f1 is similar to the set-theoretic decomposition of the union of two sets A and B into the disjoint sets AB, BA, and A ∩ B. Sample points from f0 and f1can be assigned to one of these three densities, allowing the differences between f0 and f1 to be visualized in a single plot, a visual hypothesis test of whether f0 is equal to f1. We describe two similar such decompositions and contrast their behavior under the null hypothesis f0 = f1, giving some insight into how such plots may be interpreted. We present two examples of uses of these methods: visualization of departures from independence, and of a two-class classification problem. Other potential applications are discussed.

Andreas Buja | A. Buja | Alan Gous | Alan Gous

[1] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[2] Andreas Buja,et al. XGobi: Interactive Dynamic Data Visualization in the X Window System , 1998 .

[3] S. Sheather. Density Estimation , 2004 .

[4] David W. Scott,et al. Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[5] D. J. Newman,et al. UCI Repository of Machine Learning Database , 1998 .

[6] Nicholas I. Fisher,et al. Bump hunting in high-dimensional data , 1999, Stat. Comput..