Scalable Execution of Continuous Aggregation Queries over Web Data

Data delivered over the Internet is increasingly being used to provide dynamic and personalized user experiences. Queries over fast-changing data from distributed data sources are executed to create content to be delivered to users. Because these queries require data from multiple sources, they're executed at intermediate proxies or data aggregators. The authors discuss various techniques for executing aggregation queries over distributed data to minimize the number of message exchanges between data sources, aggregators, and users. They carefully examine the problem in terms of different types of queries, aggregation functions, query imprecisions, and whether the aggregators get data from sources using pull- or push-based mechanisms.

[1]  Alejandro P. Buchmann,et al.  Building a Configurable Publish/Subscribe Notification Service , 2005, DAIS.

[2]  Beng Chin Ooi,et al.  Disseminating streaming data in a dynamic environment: an adaptive and cost-based approach , 2008, The VLDB Journal.

[3]  Yin Zhang,et al.  STAR: Self-Tuning Aggregation for Scalable Monitoring , 2007, VLDB.

[4]  Krithi Ramamritham,et al.  Query Planning for Continuous Aggregation Queries over a Network of Data Aggregators , 2012, IEEE Transactions on Knowledge and Data Engineering.

[5]  Mukesh K. Mohania,et al.  Ratio threshold queries over distributed data sources , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[6]  Prashant J. Shenoy,et al.  Efficiently maintaining stock portfolios up-to-date on the Web , 2002, Proceedings Twelfth International Workshop on Research Issues in Data Engineering: Engineering E-Commerce/E-Business Systems RIDE-2EC 2002.

[7]  Sunil Prabhakar,et al.  Filtering Data Streams for Entity-Based Continuous Queries , 2010, IEEE Transactions on Knowledge and Data Engineering.

[8]  Krithi Ramamritham,et al.  Executing incoherency bounded continuous queries at web data aggregators , 2005, WWW '05.

[9]  Y. Charlie Hu,et al.  HYPER: A Hybrid Approach to Efficient Content-Based Publish/Subscribe , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[10]  Jennifer Widom,et al.  Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[11]  Assaf Schuster,et al.  A Geometric Approach to Monitoring Threshold Functions over Distributed Data Streams , 2010, Ubiquitous Knowledge Discovery.

[12]  Graham Cormode,et al.  Sketching Streams Through the Net: Distributed Approximate Query Tracking , 2005, VLDB.

[13]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[14]  Kannan M. Moudgalya,et al.  Adaptive coherency maintenance techniques for time-varying data , 2003, RTSS 2003. 24th IEEE Real-Time Systems Symposium, 2003.