Secure and robust overlay content distribution

With the success of applications spurring the tremendous increase in the volume of data transfer, efficient and reliable content distribution has become a key issue. Peer-to-peer (P2P) technology has gained popularity as a promising approach to large-scale content distribution due to its benefits including self-organizing, load-balancing, and fault-tolerance. Despite these strengths, P2P systems also present several challenges such as performance guarantees, reliability, efficiency, and security. In P2P systems deployed on a large scale, these challenges are more difficult to deal with because of the large number of participants, unreliable user behaviors, and unexpected situations. This thesis explores solutions to improve the efficiency, robustness, and security of large-scale P2P content distribution systems, focusing on three particular issues: lookup, practical network coding, and secure network coding. A distributed hash table (DHT) is a structured overlay network service that provides a decentralized lookup for mapping objects to locations. This thesis focuses on improving the lookup performance of Kademlia DHT protocol. Although many studies have proposed DHTs to provide a means of organizing and locating peers to many distributed systems, to the best of my knowledge, Kademlia is a unique DHT deployed on an Internet-scale in the real world. This study evaluates the lookup performance of Kad (a variation of Kademlia) deployed in one of the largest P2P file-sharing networks. The measurement study shows that lookup results are not consistent; only 18% of nodes located by storing and searching lookups are the same. This lookup inconsistency problem leads to poor performance and the inefficient use of resources during lookups. This study identifies the underlying reasons for this inconsistency problem and the poor performance of lookups, and proposes solutions to guarantee reliable lookup results while providing the efficient use of resources. This thesis studies the practicality of network coding to facilitate cooperative content distribution. Network coding is a new data transmission technique which allows any nodes in a network to encode and distribute data. It is a good solution offering reliability and efficiency in distributing content, but the usefulness of network coding is still in dispute because of its dubious performance gains and coding overhead in practice. With the implementation of network coding in a real-world application, this thesis measures the performance and overhead of network coding for content distribution in practice. This study also provides a lightweight yet efficient encoding scheme which allows network coding to provide improved performance and robustness with negligible overhead. Network coding is a promising data transmission technique. However, the use of network coding also poses security vulnerabilities by allowing untrusted nodes to produce new encoded data. Network coding is seriously vulnerable to pollution attacks where malicious nodes inject false corrupted data into a network. Because of the nature of the network coding, even a single unfiltered false data block may propagate widely in the network and disrupt correct decoding on many nodes, by being mixed with other correct blocks. Since blocks are re-coded in transit, traditional hash or signature schemes do not work with network coding. Thus, this thesis introduces a new homomorphic signature scheme which efficiently verifies encoded data on-the-fly and comes with desirable features appropriate for P2P content distribution. This scheme can protect network coding from pollution attacks without delaying downloading processes.

[1]  Yinlong Xu,et al.  How Can Network Coding Help P2P Content Distribution? , 2009, 2009 IEEE International Conference on Communications.

[2]  Bin Fan,et al.  Can Network Coding Help in P2P Networks? , 2006, 2006 4th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks.

[3]  Ion Stoica,et al.  Non-Transitive Connectivity and DHTs , 2005, WORLDS.

[4]  Raymond W. Yeung Avalanche: A Network Coding Analysis , 2007, Commun. Inf. Syst..

[5]  Danny Bickson,et al.  The BitCod Client: A BitTorrent Clone using Network Coding , 2007, Seventh IEEE International Conference on Peer-to-Peer Computing (P2P 2007).

[6]  Moritz Steiner,et al.  Faster Content Access in KAD , 2008, 2008 Eighth International Conference on Peer-to-Peer Computing.

[7]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[8]  David Mazières,et al.  On-the-fly verification of rateless erasure codes for efficient content distribution , 2004, IEEE Symposium on Security and Privacy, 2004. Proceedings. 2004.

[9]  Michael Luby,et al.  A digital fountain approach to reliable distribution of bulk data , 1998, SIGCOMM '98.

[10]  Jonathan Katz,et al.  Secure Network Coding Over the Integers , 2010, IACR Cryptol. ePrint Arch..

[11]  Mikel Izal,et al.  Dissecting BitTorrent: Five Months in a Torrent's Lifetime , 2004, PAM.

[12]  Bodo Möller Algorithms for Multi-exponentiation , 2001, Selected Areas in Cryptography.

[13]  Yong Guan,et al.  An Efficient Signature-Based Scheme for Securing Network Coding Against Pollution Attacks , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[14]  K. Jain,et al.  Practical Network Coding , 2003 .

[15]  Taoufik En-Najjary,et al.  Actively Monitoring Peers in KAD , 2007, IPTPS.

[16]  David Mazières,et al.  Kademlia: A Peer-to-Peer Information System Based on the XOR Metric , 2002, IPTPS.

[17]  Baochun Li,et al.  Lava: A Reality Check of Network Coding in Peer-to-Peer Live Streaming , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[18]  Taoufik En-Najjary,et al.  A global view of kad , 2007, IMC '07.

[19]  Dan Boneh,et al.  Homomorphic MACs: MAC-Based Integrity for Network Coding , 2009, ACNS.

[20]  Christos Gkantsidis,et al.  Comprehensive view of a live network coding P2P system , 2006, IMC '06.

[21]  Xin Wang,et al.  Swifter: Chunked Network Coding for Peer-to-Peer Content Distribution , 2008, 2008 IEEE International Conference on Communications.

[22]  Christos Gkantsidis,et al.  Network coding for large scale content distribution , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[23]  Mihir Bellare,et al.  Incremental Cryptography: The Case of Hashing and Signing , 1994, CRYPTO.

[24]  Ben Y. Zhao,et al.  Tapestry: a resilient global-scale overlay for service deployment , 2004, IEEE Journal on Selected Areas in Communications.

[25]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[26]  Daniel Stutzbach,et al.  Improving Lookup Performance Over a Widely-Deployed DHT , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[27]  David Mazières,et al.  Rateless Codes and Big Downloads , 2003, IPTPS.

[28]  Christos Gkantsidis,et al.  Cooperative Security for Network Coding File Distribution , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[29]  Pablo Rodriguez,et al.  Dynamic parallel access to replicated content in the internet , 2002, TNET.

[30]  Yinlong Xu,et al.  A Content Distribution System based on Sparse Linear Network Coding , 2006 .

[31]  Yuanyuan Yang,et al.  Peer-to-Peer File Sharing Based on Network Coding , 2008, 2008 The 28th International Conference on Distributed Computing Systems.

[32]  Peter Druschel,et al.  Pastry: Scalable, distributed object location and routing for large-scale peer-to- , 2001 .

[33]  Fang Zhao,et al.  Signatures for Content Distribution with Network Coding , 2007, 2007 IEEE International Symposium on Information Theory.

[34]  Dong Hoon Lee,et al.  Use of Sparse and/or Complex Exponents in Batch Verification of Exponentiations , 2006, IEEE Transactions on Computers.

[35]  Arun Venkataramani,et al.  Do incentives build robustness in bit torrent , 2007 .

[36]  Jonathan Katz,et al.  Signing a Linear Subspace: Signature Schemes for Network Coding , 2009, IACR Cryptol. ePrint Arch..

[37]  Hovav Shacham,et al.  Aggregate and Verifiably Encrypted Signatures from Bilinear Maps , 2003, EUROCRYPT.

[38]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[39]  Kamal Jain,et al.  Signatures for Network Coding , 2006, 2006 40th Annual Conference on Information Sciences and Systems.

[40]  Thomas E. Anderson,et al.  Profiling a million user dht , 2007, IMC '07.

[41]  Rudolf Ahlswede,et al.  Network information flow , 2000, IEEE Trans. Inf. Theory.

[42]  Shuo-Yen Robert Li,et al.  Linear network coding , 2003, IEEE Trans. Inf. Theory.