Case Study: Scoop for Partial Read from P2P Database

In this paper we propose Scoop, a mechanism to implement the “partial read operation” for peer-to-peer databases. A peer-to-peer database is a database that its relations are horizontally fragmented and distributed among the nodes of a peer-to-peer network. The partial read operation is a data retrieval operation required for approximate query processing in peer-to-peer databases. A partial read operation answers to β -queries: given β ∈ [0,1]and a relation R, a fraction β of the tuples in R must be retrieved from the database to answer a β -query. Despite the simplicity of the β -query, due to the distributed, evolving and autonomous nature of the peer-to-peer databases correct and efficient implementation of the partial read operation is challenging. Scoop is designed based on an epidemic dissemination algorithm. We model the epidemic dissemination as a percolation problem and by rigorous percolation analysis tune Scoop per-query and on-the-fly to answer β -queries correctly and efficiently. We prove the correctness of Scoop by theoretical analysis, and verify the efficiency of Scoop in terms of query cost and query time via extensive simulation.

[1]  Farnoush Banaei Kashani,et al.  Partial Selection Query in Peer-to-Peer Databases , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  Bruce A. Reed,et al.  A Critical Point for Random Graphs with a Given Degree Sequence , 1995, Random Struct. Algorithms.

[3]  Christos Gkantsidis,et al.  Hybrid search schemes for unstructured peer-to-peer networks , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[4]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[5]  Johannes Gehrke,et al.  Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[6]  Mihajlo A. Jovanović,et al.  Modeling Large-scale Peer-to-Peer Networks and a Case Study of Gnutella , 2001 .

[7]  Donald F. Towsley,et al.  The effect of network topology on the spread of epidemics , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[8]  Hector Garcia-Molina,et al.  Designing a super-peer network , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[9]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[10]  Matei Ripeanu,et al.  Peer-to-peer architecture case study: Gnutella network , 2001, Proceedings First International Conference on Peer-to-Peer Computing.

[11]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[12]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Juraj Hromkovič,et al.  Dissemination of Information in Interconnection Networks (Broadcasting & Gossiping) , 1996 .

[14]  Doug Terry,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[15]  Lada A. Adamic,et al.  Search in Power-Law Networks , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[17]  Deborah Estrin,et al.  An Empirical Study of Epidemic Algorithms in Large Scale Multihop Wireless Networks , 2002 .

[18]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .

[19]  S. Redner,et al.  Introduction To Percolation Theory , 2018 .

[20]  Herbert W. Hethcote,et al.  The Mathematics of Infectious Diseases , 2000, SIAM Rev..

[21]  Stephen P. Boyd,et al.  Gossip algorithms: design, analysis and applications , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[22]  Joseph Y. Halpern,et al.  Gossip-based ad hoc routing , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[23]  Wen-Chi Hou,et al.  Processing real-time, non-aggregate queries with time-constraints in CASE-DB , 1992, [1992] Eighth International Conference on Data Engineering.

[24]  Scott Shenker,et al.  Can Heterogeneity Make Gnutella Scalable? , 2002, IPTPS.

[25]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002, ICS '02.

[26]  Krishna P. Gummadi,et al.  Measurement, modeling, and analysis of a peer-to-peer file-sharing workload , 2003, SOSP '03.

[27]  Vwani P. Roychowdhury,et al.  Percolation search in power law networks: making unstructured peer-to-peer networks scalable , 2004 .

[28]  Béla Bollobás,et al.  Random Graphs , 1985 .

[29]  H. Wilf generatingfunctionology: Third Edition , 1990 .

[30]  Jane W.-S. Liu,et al.  APPROXIMATE - A Query Processor that Produces Monotonically Improving Approximate Answers , 1993, IEEE Trans. Knowl. Data Eng..

[31]  Richard M. Karp,et al.  Randomized rumor spreading , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.