We re-visit the common assumption that privacy risks of sharing Internet infrastructure data outweigh the benefits, and suggest that we have a window of opportunity in which to apply methods for undertaking empirical Internet research that can lower privacy risks while achieving research utility. The current default, defensive posture to not share network data derives from the purgatory formed by the gaps in regulation and law, commercial pressures, and evolving considerations of both threat models and ethical behavior. We propose the components of a self-regulating framework for transparent and repeatable sharing that can move the Internet research stakeholder community beyond the relatively siloed, ad hoc and below-the-radar status quo practices towards a more reputable and pervasive scientific discipline. The threat model from not data sharing is necessarily vague, as damages resulting from knowledge management deficiencies are beset with causation and correlation challenges. And at a more basic level, we lack a risk profile for our communications fabric, partly as a result of the data dearth. Notably, society has not felt the pain points that normally motivate legislative, judicial or policy change – explicit and immediate “body counts” or billion dollar losses. Admittedly, the policies that have given rise to the Internet’s tremendous growth and support for network innovations have also rendered the entire sector opaque and unamenable to objective empirical macroscopic analysis, in ways and for reasons disconcertingly resonant with the U.S. financial sector before its 2008 meltdown. The opaqueness, juxtaposed with this decade’s proliferation of Internet security, scalability, sustainability, and stewardship issues, is a cause for concern for the integrity of the infrastructure, as well
[1]
Susan Freiwald.
Online Surveillance: Remembering the Lessons of the Wiretap Act
,
2005
.
[2]
Tristan Henderson,et al.
Sharing is caring: so where are your data?
,
2008,
CCRV.
[3]
Vern Paxson,et al.
Issues and etiquette concerning use of shared measurement data
,
2007,
IMC '07.
[4]
Orin S. Kerr.
Internet Surveillance Law after the USA Patriot Act: The Big Brother that Isn't
,
2002
.
[5]
Vitaly Shmatikov,et al.
Robust De-anonymization of Large Sparse Datasets
,
2008,
2008 IEEE Symposium on Security and Privacy (sp 2008).
[6]
kc claffy,et al.
Ten Things Lawyers should Know about Internet Research
,
2008
.
[7]
Massimo Barbaro,et al.
A Face Is Exposed for AOL Searcher No
,
2006
.
[8]
Aaron J. Burstein.
Amending the ECPA to Enable a Culture of Cybersecurity Research
,
2008
.
[9]
Daniel J. Solove.
Reconstructing Electronic Surveillance Law
,
2003
.
[10]
C. Christine Porter,et al.
[5ShidlerJLComTech003] De-Identified Data and Third Party Data Mining: The Risk of Re-Identification of Personal Information
,
2008
.
[11]
Matthew J. Tokson.
The Content/Envelope Distinction in Internet Surveillance Law
,
2009
.
[12]
Martin F. Arlitt,et al.
SC2D: an alternative to trace anonymization
,
2006,
MineNet '06.