Moirae: History-Enhanced Monitoring

In this paper, we investigate the benefits and challenges of integrating history into a near-real-time monitoring system; and present a general purpose continuous monitoring engine, called Moirae, that supports this integration. Moirae is designed to enable different types of queries over live and historical data. In particular, Moirae supports (1) queries that look up specific historical information for each newly detected event and (2) queries that complement new events with information about similar past events. Moirae focuses on applications where querying a historical log in its entirety would be too slow to meet application needs, and could potentially yield an overwhelming number of results. The goal of the system is to produce the most relevant approximate results quickly and, when necessary, additional more precise results incrementally. In this paper, we discuss the challenges of integrating history into a continuous monitoring engine, present the design of Moirae, and show how our proposed architecture supports the above types of queries.

[1]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[2]  Gerhard Weikum,et al.  Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? , 2005, CIDR.

[3]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[4]  Frederick Reiss,et al.  Design Considerations for High Fan-In Systems: The HiFi Approach , 2005, CIDR.

[5]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[6]  Michael J. Carey,et al.  On saying “Enough already!” in SQL , 1997, SIGMOD '97.

[7]  Michael Stonebraker,et al.  Managing persistent objects in a multi-level store , 1991, SIGMOD '91.

[8]  Jonathan Goldstein,et al.  Optimizing queries using materialized views: a practical, scalable solution , 2001, SIGMOD '01.

[9]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[10]  Michael Stonebraker,et al.  High-availability algorithms for distributed stream processing , 2005, 21st International Conference on Data Engineering (ICDE'05).

[11]  Joseph M. Hellerstein,et al.  Partial results for online query processing , 2002, SIGMOD '02.

[12]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[13]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[14]  Gultekin Özsoyoglu,et al.  Temporal and Real-Time Databases: A Survey , 1995, IEEE Trans. Knowl. Data Eng..

[15]  Michael Stonebraker,et al.  Retrospective on Aurora , 2004, The VLDB Journal.

[16]  David J. DeWitt,et al.  Architecting a Network Query Engine for Producing Partial Results , 2000, WebDB.

[17]  Michael Stonebraker,et al.  Linear Road: A Stream Data Management Benchmark , 2004, VLDB.

[18]  Stanley B. Zdonik,et al.  Revision Processing in a Stream Processing Engine: A High-Level Design , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[19]  David Maier,et al.  Exploiting Punctuation Semantics in Continuous Data Streams , 2003, IEEE Trans. Knowl. Data Eng..

[20]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[21]  Eric Horvitz,et al.  Prediction, Expectation, and Surprise: Methods, Designs, and Study of a Deployed Traffic Forecasting Service , 2005, UAI.

[22]  Roy Goldman,et al.  Proximity Search in Databases , 1998, VLDB.

[23]  Gerhard Weikum,et al.  The LHAM log-structured history data access method , 2000, The VLDB Journal.

[24]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[25]  Michael J. Franklin,et al.  Query processing over live and archived data streams , 2005 .

[26]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[27]  Navendu Jain,et al.  Design, implementation, and evaluation of the linear road bnchmark on the stream processing core , 2006, SIGMOD Conference.

[28]  Luis Gravano,et al.  Evaluating Top-k Selection Queries , 1999, VLDB.

[29]  Claudio Sartori,et al.  Partial Indexing for Nonuniform Data Distributions in relational DBMS's , 1994, IEEE Trans. Knowl. Data Eng..

[30]  Kien A. Hua,et al.  A non-linear dimensionality-reduction technique for fast similarity search in large databases , 2006, SIGMOD Conference.

[31]  Michael Stonebraker,et al.  The case for partial indexes , 1989, SGMD.

[32]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[33]  Peter J. Haas,et al.  Ripple joins for online aggregation , 1999, SIGMOD '99.

[34]  Richard T. Snodgrass,et al.  A taxonomy of time databases , 1985, SIGMOD Conference.

[35]  Beng Chin Ooi,et al.  Online Feedback for Nested Aggregate Queries with Multi-Threading , 1999, VLDB.

[36]  Joseph M. Hellerstein,et al.  Informix under CONTROL: Online Query Processing , 2000, Data Mining and Knowledge Discovery.

[37]  Steve Herbert,et al.  Dealing with disorder , 2008 .

[38]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[39]  Praveen Seshadri,et al.  Generalized partial indexes , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[40]  Michael J. Franklin,et al.  Remembrance of Streams Past: Overload-Sensitive Management of Archived Streams , 2004, VLDB.

[41]  Christos Faloutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[42]  Mohamed F. Mokbel,et al.  Transaction Time Support Inside a Database Engine , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[43]  Carl A. Waldspurger,et al.  Stride Scheduling: Deterministic Proportional- Share Resource Management , 1995 .

[44]  Kevin Chen-Chuan Chang,et al.  RankSQL: query algebra and optimization for relational top-k queries , 2005, SIGMOD '05.

[45]  Zhaohua Li TelegraphCQ : Continuous Dataflow Processing for an Uncertain World , 2006 .

[46]  Aristides Gionis,et al.  Automated Ranking of Database Query Results , 2003, CIDR.

[47]  Michael Stonebraker,et al.  Fault-tolerance in the Borealis distributed stream processing system , 2005, SIGMOD '05.

[48]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.