Querying the Internet with PIER

The database research community prides itself on scalable technologies. Yet database systems traditionally do not excel on one important scalability dimension: the degree of distribution. This limitation has hampered the impact of database technologies on massively distributed systems like the Internet. In this paper, we present the initial design of PIER, a massively distributed query engine based on overlay networks, which is intended to bring database query processing facilities to new, widely distributed environments. We motivate the need for massively distributed queries, and argue for a relaxation of certain traditional database research goals in the pursuit of scalability and widespread adoption. We present simulation results showing PIER gracefully running relational queries across thousands of machines, and show results from the same software base in actual deployment on a large experimental cluster.

[1]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[2]  Ion Stoica,et al.  The Case for a Hybrid P2P Search Infrastructure , 2004, IPTPS.

[3]  A. N. Wilschut,et al.  Dataflow query execution in a parallel main-memory environment , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[4]  Dan Suciu,et al.  What Can Database Do for Peer-to-Peer? , 2001, WebDB.

[5]  Guy M. Lohman,et al.  Optimizer Validation and Performance Evaluation for Distributed Queries , 1998 .

[6]  Scott Shenker,et al.  Enhancing P2P File-Sharing with an Internet-Scale Query Processor , 2004, VLDB.

[7]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[8]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[9]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[10]  Wei Hong,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tag: a Tiny Aggregation Service for Ad-hoc Sensor Networks , 2022 .

[11]  Joos Vandewalle,et al.  Solutions for anonymous communication on the Internet , 1999, Proceedings IEEE 33rd Annual 1999 International Carnahan Conference on Security Technology (Cat. No.99CH36303).

[12]  Fausto Giunchiglia,et al.  Data Management for Peer-to-Peer Computing : A Vision , 2002, WebDB.

[13]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[14]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[15]  Ben Y. Zhao,et al.  An Infrastructure for Fault-tolerant Wide-area Location and Routing , 2001 .

[16]  Waqar Hasan,et al.  Optimization of SQL Queries for Parallel Machines , 1996, Lecture Notes in Computer Science.

[17]  Ben Y. Zhao,et al.  Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and , 2001 .

[18]  Scott Shenker,et al.  Complex Queries in Dht-based Peer-to-peer Networks , 2002 .

[19]  Ellen W. Zegura,et al.  Network Measurement as a Cooperative Enterprise , 2002, IPTPS.

[20]  Margo I. Seltzer,et al.  Berkeley DB , 1999, USENIX Annual Technical Conference, FREENIX Track.

[21]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[22]  Michael Stonebraker,et al.  Mariposa: a wide-area distributed database system , 1996, The VLDB Journal.

[23]  Scott Shenker,et al.  The Architecture of PIER: an Internet-Scale Query Processor , 2005, CIDR.

[24]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[25]  David D. Clark,et al.  The design philosophy of the DARPA internet protocols , 1988, SIGCOMM '88.

[26]  David E. Culler,et al.  A blueprint for introducing disruptive technology into the Internet , 2003, CCRV.

[27]  Michael Stonebraker,et al.  The Asilomar report on database research , 1998, SGMD.

[28]  Robbert van Renesse,et al.  Scalable Management and Data Mining Using Astrolabe , 2002, IPTPS.

[29]  Philippe Bonnet,et al.  Towards Sensor Database Systems , 2001, Mobile Data Management.

[30]  R. Huebsch Content-Based Multicast: Comparison of Implementation Options , 2003 .

[31]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[32]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[33]  Patrick Valduriez,et al.  Principles of distributed database systems (2nd ed.) , 1999 .

[34]  Goetz Graefe,et al.  Encapsulation of parallelism in the Volcano query processing system , 1990, SIGMOD '90.

[35]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[36]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[37]  Sally Floyd,et al.  Identifying the tcp behavior of web servers , 2000, SIGCOMM 2000.

[38]  Zhichen Xu,et al.  pSearch: information retrieval in structured overlays , 2003, CCRV.

[39]  Paul V. Mockapetris,et al.  Domain names - implementation and specification , 1987, RFC.