论文信息 - Cleaning up the record on the maximal information coefficient and equitability

Cleaning up the record on the maximal information coefficient and equitability

Although we appreciate Kinney and Atwal’s interest in equitability and maximal information coefficient (MIC), we believe they misrepresent our work. We highlight a few of our main objections below. Fig. 1. Equitability of MIC and mutual information under a range of noise models. The equitability of MIC and mutual information across a subset of noise models analyzed in refs. 1 and 4. For each noise model, the relationships tested are as in ref. 4. In each ... Regarding our original paper (1), Kinney and Atwal (2) state “MIC is said to satisfy not just the heuristic notion of equitability, but also the mathematical criterion of R2 equitability,” the latter being their formalization of the heuristic notion that we introduced. This statement is simply false. We were explicit in our paper that our claims regarding MIC’s performance were based on large-scale simulations: “We tested MIC’s equitability through simulations….[These] show that, for a large collection of test functions with varied sample sizes, noise levels, and noise models, MIC roughly equals the coefficient of determination R2 relative to each respective noiseless function.” Although we mathematically proved several things about MIC, none of our claims imply that it satisfies Kinney and Atwal’s R2 equitability, which would require that MIC exactly equal R2 in the infinite data limit. Thus, their proof that no dependence measure can satisfy R2 equitability, although interesting, does not uncover any error in our work, and their suggestion that it does is a gross misrepresentation. Kinney and Atwal seem ready to toss out equitability as a useful criterion based on their theoretical result. We argue, however, that regardless of whether “perfect” equitability is possible, approximate notions of equitability remain the right goal for many data exploration settings. Just as the theory of NP completeness does not suggest we stop thinking about NP complete problems, but instead that we look for approximations and solutions in restricted cases, an impossibility result about perfect equitability provides focus for further research, but does not mean that useful solutions are unattainable. Similarly, as others have noted (3), Kinney and Atwal’s proof requires a highly permissive noise model, and so the attainability of R2 equitability under more limited noise models such as those in our work remains an open question. Finally, the authors argue that mutual information is more equitable than MIC. However, they provide as justification only a single noise model, only at limiting sample sizes (n≥5,000). As we’ve shown in follow-up work (4), which they themselves cite but fail to address, MIC is more equitable than mutual information estimation under many other realistic noise models even at a sample size of 5,000. Kinney and Atwal have stated, “…it matters how one defines noise” (5), and a useful statistic must indeed be robust to a wide range of noise models. Equally importantly, we’ve established in both our original and follow-up work that at sample size regimes less than 5,000, MIC is more equitable than mutual information estimates across all noise models tested. MIC’s superior equitability in these settings is not an “artifact” we neglected—as Kinney and Atwal suggest—but rather a weakness of mutual information estimation and an important consideration for practitioners. We expect that the understanding of equitability and MIC will improve over time and that better methods may arise. However, accurate representations of the work thus far will allow researchers in the area to most productively and collectively move forward.

[1] Daniel S. Murrell,et al. R2-equitability is satisfiable , 2014, Proceedings of the National Academy of Sciences.

[2] Reply to Murrell et al.: Noise matters , 2014, Proceedings of the National Academy of Sciences.

[3] J. Kinney,et al. Equitability, mutual information, and the maximal information coefficient , 2013, Proceedings of the National Academy of Sciences.

[4] Michael Mitzenmacher,et al. Detecting Novel Associations in Large Data Sets , 2011, Science.