Collaborative and privacy-preserving estimation of IP address space utilisation

Abstract Exhaustion of the IPv4 address space is driving mitigation technologies, such as carrier-grade NAT or IPv6. Understanding this driver requires knowing how much allocated IPv4 space is actively used over time – a non-trivial goal due to privacy concerns and practical measurement challenges. To address this gap we present a collaborative and privacy-preserving capture-recapture (CR) technique for estimating IP address space utilisation. Public and private datasets of IP addresses observed by multiple independent collaborators can be combined for CR analysis, without any individual collaborator’s privately observed addresses leaking to the others. We show that CR estimation is much more accurate than assuming all used addresses are observed, and that our scheme scales well to datasets of over a billion addresses across several collaborators. We estimate that 1.2 billion IPv4 addresses and 6.5 million /24 subnets were actively used at the end of 2014, and also analyse address usage depending on RIR and country.

[1]  F. C. Lincoln Calculating waterfowl abundance on the basis of banding returns , 1930 .

[2]  A Chao,et al.  The applications of capture‐recapture models to epidemiological data , 2001, Statistics in medicine.

[3]  R R Regal,et al.  Capture-recapture methods in epidemiology: methods and limitations. , 1995, Epidemiologic reviews.

[4]  Lachlan L. H. Andrew,et al.  Capturing ghosts: predicting the used IPv4 space by inferring unobserved addresses , 2014, Internet Measurement Conference.

[5]  John S. Heidemann,et al.  Understanding block-level address usage in the visible internet , 2010, SIGCOMM '10.

[6]  kc claffy,et al.  Estimating internet address space usage through passive measurements , 2013, CCRV.

[7]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[8]  Ken Thompson,et al.  Password security: a case history , 1979, CACM.

[9]  Zoe Emily Schnabel The Estimation of the Total Fish Population of a Lake , 1938 .

[10]  Lachlan L. H. Andrew,et al.  Mitigating sampling error when measuring internet client IPv6 capabilities , 2012, IMC '12.

[11]  Byron J. T. Morgan,et al.  Analysis of Capture-Recapture Data , 2014 .

[12]  Giovane C. M. Moura,et al.  How dynamic is the ISPs address space? Towards internet-wide DHCP churn estimation , 2015, 2015 IFIP Networking Conference (IFIP Networking).

[13]  Anne Chao,et al.  An overview of closed capture-recapture models , 2001 .

[14]  Benny Pinkas,et al.  Faster Private Set Intersection Based on OT Extension , 2014, USENIX Security Symposium.

[15]  Martin E. Hellman,et al.  An improved algorithm for computing logarithms over GF(p) and its cryptographic significance (Corresp.) , 1978, IEEE Trans. Inf. Theory.

[16]  Nick G. Duffield,et al.  Trajectory sampling for direct traffic observation , 2001, TNET.

[17]  A. Chao,et al.  A Sample Coverage Approach to Multiple-System Estimation with Application to Census Undercount , 1998 .

[18]  Mihir Bellare,et al.  Efficient Garbling from a Fixed-Key Blockcipher , 2013, 2013 IEEE Symposium on Security and Privacy.

[19]  Lachlan L. H. Andrew,et al.  Estimating IPv4 address space usage with capture-recapture , 2013, 38th Annual IEEE Conference on Local Computer Networks - Workshops.

[20]  W. Feller On the Normal Approximation to the Binomial Distribution , 1945 .

[21]  S. Pledger Unified Maximum Likelihood Estimates for Closed Capture–Recapture Models Using Mixtures , 2000, Biometrics.

[22]  L. Rivest,et al.  Rcapture: Loglinear Models for Capture-Recapture in R , 2007 .

[23]  J. Nichols,et al.  Statistical inference for capture-recapture experiments , 1992 .

[24]  Ramesh Govindan,et al.  Census and survey of the visible internet , 2008, IMC '08.

[25]  Chris Clifton,et al.  Secure set intersection cardinality with application to association rule mining , 2005, J. Comput. Secur..

[26]  Alberto Dainotti,et al.  Lost in Space: Improving Inference of IPv4 Address Space Utilization , 2016, IEEE Journal on Selected Areas in Communications.