An Internet Data Sharing Framework For Balancing Privacy and Utility

We re-visit the common assumption that privacy risks of sharing Internet infrastructure data outweigh the benefits, and suggest that we have a window of opportunity in which to apply methods for undertaking empirical Internet research that can lower privacy risks while achieving research utility. The current default, defensive posture to not share network data derives from the purgatory formed by the gaps in regulation and law, commercial pressures, and evolving considerations of both threat models and ethical behavior. We propose the components of a self-regulating framework for transparent and repeatable sharing that can move the Internet research stakeholder community beyond the relatively siloed, ad hoc and below-the-radar status quo practices towards a more reputable and pervasive scientific discipline. The threat model from not data sharing is necessarily vague, as damages resulting from knowledge management deficiencies are beset with causation and correlation challenges. And at a more basic level, we lack a risk profile for our communications fabric, partly as a result of the data dearth. Notably, society has not felt the pain points that normally motivate legislative, judicial or policy change – explicit and immediate “body counts” or billion dollar losses. Admittedly, the policies that have given rise to the Internet’s tremendous growth and support for network innovations have also rendered the entire sector opaque and unamenable to objective empirical macroscopic analysis, in ways and for reasons disconcertingly resonant with the U.S. financial sector before its 2008 meltdown. The opaqueness, juxtaposed with this decade’s proliferation of Internet security, scalability, sustainability, and stewardship issues, is a cause for concern for the integrity of the infrastructure, as well