Enabling persistent queries for cross-aggregate performance monitoring

It is essential for distributed, data-intensive applications to monitor the performance of the underlying network, storage, and computational resources. Increasingly, distributed applications need performance information from multiple aggregates, and tools need to make real-time steering decisions based on the performance feedback. With increasing scale and complexity, the volume and velocity of monitoring data is increasing, posing scalability challenges. In this work, we have developed a persistent query agent (PQA) that provides real-time application and network performance feedback to clients/ applications, thereby enabling dynamic adaptations. The PQA enables federated performance monitoring by interacting with multiple aggregates and performance monitoring sources. Using a publish-subscribe framework, it sends triggers asynchronously to applications/clients when relevant performance events occur. The applications/clients register their events of interest using declarative queries and get notified by the PQA. The PQA leverages a complex event processing (CEP) framework for managing and executing the queries expressed in a standard SQL-like query language. Instead of saving all monitoring data for future analysis, PQA observes performance event streams in real time, and runs continuous queries over streams of monitoring events. In this work, we present the design and architecture of the PQA, and describe some relevant use cases.

[1]  Jeffrey S. Chase,et al.  ExoGENI: A Multi-Domain Infrastructure-as-a-Service Testbed , 2012, The GENI Book.

[2]  Maximilian Ott,et al.  Measurement Architectures for Network Experiments with Disconnected Mobile Nodes , 2010, TRIDENTCOM.

[3]  Bernard Cousin,et al.  Proposal for the configuration of multi-domain network monitoring architecture , 2011, The International Conference on Information Networking 2011 (ICOIN2011).

[4]  Salvatore D'Antonio,et al.  INTERMON: An Architecture for Inter-domain Monitoring, Modelling and Simulation , 2005, NETWORKING.

[5]  Prasad Calyam,et al.  OnTimeDetect: Dynamic Network Anomaly Notification in perfSONAR Deployments , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[6]  Brian Tierney,et al.  Instantiating a Global Network Measurement Framework , 2008 .

[7]  Andreas Hanemann,et al.  Towards Multi-Domain Monitoring for the European Research Networks , 2005, TNC.

[8]  Maximilian Ott,et al.  A Portal to Support Rigorous Experimental Methodology in Networking Research , 2011, TRIDENTCOM.

[9]  Partha Kanuparthy,et al.  Pythia: detection, localization, and diagnosis of performance problems , 2013, IEEE Communications Magazine.

[10]  Alessandro Margara,et al.  Processing flows of information: From data stream to complex event processing , 2012, CSUR.

[11]  D. Martin Swany,et al.  Scalable integrated performance analysis of multi-gigabit networks , 2012, 2012 IEEE Network Operations and Management Symposium.

[12]  D. Martin Swany,et al.  Hierarchically Federated Registration and Lookup within the perfSONAR Framework , 2007, 2007 10th IFIP/IEEE International Symposium on Integrated Network Management.

[13]  Ciprian Dobre,et al.  MonALISA: An agent based, dynamic service system to monitor, control and optimize distributed systems , 2009, Comput. Phys. Commun..