Visual Comparison of Datasets Using Mixture Decompositions

This article describes how a mixture of two densities, f0 and f1, may be decomposed into a different mixture consisting of three densities. These new densities, f+, f-, and f=, summarize differences between f0 and f1: f+ is high in areas of excess of f1 compared to f0; f- represents deficiency of f1 compared to f0 in the same way; f= represents commonality between f1 and f0. The supports of f+ and f- are disjoint. This decomposition of the mixture of f0 and f1 is similar to the set-theoretic decomposition of the union of two sets A and B into the disjoint sets AB, BA, and A ∩ B. Sample points from f0 and f1can be assigned to one of these three densities, allowing the differences between f0 and f1 to be visualized in a single plot, a visual hypothesis test of whether f0 is equal to f1. We describe two similar such decompositions and contrast their behavior under the null hypothesis f0 = f1, giving some insight into how such plots may be interpreted. We present two examples of uses of these methods: visualization of departures from independence, and of a two-class classification problem. Other potential applications are discussed.