Self-adaptive approximate queries for large-scale information aggregation

Self-adaptation enables distributed software to modify its behaviour based on changes in the operating environment. In large-scale information systems for cloud computing that use hierarchical data aggregation, self-adaption may be used to respond to an approximate query, thereby reducing use of network bandwidth and retrieval time. We present a novel algorithm that uses an Analytic Hierarchical Process (AHP) in order to apply self-adaption to approximate queries based on network-awareness. The AHP-based algorithm provides a trade-off among network usage, retrieval time and the accuracy of the retrieved results. Simulations show that the number of needed messages reduces with AHP to a constant upper bound. The retrieval time reduces to a constant factor under an increasing number of nodes. Our results demonstrate that the algorithm is able to provide responses with the required accuracy, primarily by adapting the depth of the query based on the number of messages and the network conditions.

[1]  David Taniar,et al.  Performance analysis of "Groupby-After-Join" query processing in parallel database systems , 2004, Inf. Sci..

[2]  Shantenu Jha,et al.  Self-adaptive Architectures for Autonomic Computational Science , 2009, SOAR.

[3]  Scott Shenker,et al.  The Architecture of PIER: an Internet-Scale Query Processor , 2005, CIDR.

[4]  Yong Meng Teo,et al.  An adaptive stabilization framework for distributed hash tables , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[5]  T. Saaty How to Make a Decision: The Analytic Hierarchy Process , 1990 .

[6]  Simon S. Lam,et al.  Failure recovery for structured P2P networks: protocol design and performance evaluation , 2004, SIGMETRICS '04/Performance '04.

[7]  Leandro Navarro-Moldes,et al.  Towards the development of a decentralized market information system: Requirements and architecture , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[8]  Guillaume Pierre,et al.  Adam2: Reliable Distribution Estimation in Decentralised Environments , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.

[9]  Viswanath Poosala,et al.  Aqua: A Fast Decision Support Systems Using Approximate Query Answers , 1999, VLDB.

[10]  Beng Chin Ooi,et al.  An adaptive peer-to-peer network for distributed caching of OLAP results , 2002, SIGMOD '02.

[11]  Antony I. T. Rowstron,et al.  Delay aware querying with Seaweed , 2007, The VLDB Journal.

[12]  Fatos Xhafa,et al.  A parallel grid-based implementation for real-time processing of event log data of collaborative applications , 2010, Int. J. Web Grid Serv..

[13]  Dejan S. Milojicic,et al.  Moara: Flexible and Scalable Group-Based Querying System , 2008, Middleware.

[14]  David Taniar,et al.  Ontology as a Service (OaaS): a case for sub-ontology merging on the cloud , 2011, The Journal of Supercomputing.

[15]  Robbert van Renesse,et al.  Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining , 2003, TOCS.

[16]  Aaron Harwood,et al.  A comparative study on Peer-to-Peer failure rate estimation , 2007, 2007 International Conference on Parallel and Distributed Systems.

[17]  Leandro Navarro-Moldes,et al.  Network-aware summarisation for resource discovery in P2P-content networks , 2012, Future Gener. Comput. Syst..

[18]  Ivan Janciak,et al.  A grid services cloud for molecular modelling workflows , 2010, Int. J. Web Grid Serv..

[19]  Charles Gouin-Vallerand,et al.  A standard ontology for smart spaces , 2010, Int. J. Web Grid Serv..

[20]  Rizos Sakellariou,et al.  A taxonomy of grid monitoring systems , 2005, Future Gener. Comput. Syst..

[21]  Yin Zhang,et al.  STAR: Self-Tuning Aggregation for Scalable Monitoring , 2007, VLDB.

[22]  Paul Watson,et al.  Evaluating a Peer-to-Peer Database Server Based on BitTorrent , 2009, BNCOD.

[23]  Praveen Yalagandula,et al.  A scalable information management middleware for large distributed systems , 2005 .

[24]  Jing Zhu,et al.  SOMO: Self-Organized Metadata Overlay for Resource Management in P2P DHT , 2003, IPTPS.

[25]  Dimitrios Gunopulos,et al.  Efficient Approximate Query Processing in Peer-to-Peer Networks , 2007, IEEE Transactions on Knowledge and Data Engineering.

[26]  Surajit Chaudhuri,et al.  Optimized stratified sampling for approximate query processing , 2007, TODS.

[27]  Robert Tappan Morris,et al.  Comparing the Performance of Distributed Hash Tables Under Churn , 2004, IPTPS.

[28]  Roger Wattenhofer,et al.  Aggregating information in peer-to-peer systems for improved join and leave , 2004 .

[29]  Miguel Castro,et al.  SCRIBE: The Design of a Large-Scale Event Notification Infrastructure , 2001, Networked Group Communication.

[30]  Paul Watson,et al.  A Peer-to-Peer Database Server , 2008, BNCOD.

[31]  Jeffrey F. Naughton,et al.  Adaptive parallel aggregation algorithms , 1995, SIGMOD '95.

[32]  Andrey Gubarev,et al.  Dremel : Interactive Analysis of Web-Scale Datasets , 2011 .

[33]  Praveen Yalagandula,et al.  A scalable distributed information management system , 2004, SIGCOMM 2004.

[34]  Kjetil Nørvåg,et al.  Robust aggregation in peer-to-peer database systems , 2008, IDEAS '08.

[35]  Paolo Manghi,et al.  Scalable Query Dissemination in XPeer , 2007, 11th International Database Engineering and Applications Symposium (IDEAS 2007).

[36]  Friedemann Mattern,et al.  Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation , 1993, J. Parallel Distributed Comput..

[37]  Beng Chin Ooi,et al.  PeerDB: peering into personal databases , 2003, SIGMOD '03.

[38]  Qi Han,et al.  Addressing timeliness/accuracy/cost tradeoffs in information collection for dynamic environments , 2003, RTSS 2003. 24th IEEE Real-Time Systems Symposium, 2003.

[39]  Rajeev Motwani,et al.  The price of validity in dynamic networks , 2004, SIGMOD '04.

[40]  Michael Dahlin,et al.  Shruti: A Self-Tuning Hierarchical Aggregation System , 2007, First International Conference on Self-Adaptive and Self-Organizing Systems (SASO 2007).

[41]  Anne-Marie Kermarrec,et al.  Peer counting and sampling in overlay networks: random walk methods , 2006, PODC '06.

[42]  Hyun Yoe,et al.  Towards a smart service based on a context-aware workflow model in u-agriculture , 2011, Int. J. Web Grid Serv..

[43]  M. Raynal,et al.  Computing particular snapshots in distributed systems , 1990, Ninth Annual International Phoenix Conference on Computers and Communications. 1990 Conference Proceedings.

[44]  Miguel Castro,et al.  Performance and dependability of structured peer-to-peer overlays , 2004, International Conference on Dependable Systems and Networks, 2004.

[45]  Daniel L. Silver,et al.  User profile management: reference model and web services implementation , 2010, Int. J. Web Grid Serv..

[46]  Yin Zhang,et al.  Usenix Association 8th Usenix Symposium on Operating Systems Design and Implementation 87 Network Imprecision: a New Consistency Metric for Scalable Monitoring , 2022 .