Overhead Study of Telegraf as a Real-Time Monitoring Agent

Large-scale distributed systems have become an essential part of our everyday life. These systems have a large number of hardware and software components, often cooperating in complex and unpredictable ways. Operating these kinds of systems requires centralized monitoring to understand their overall states. While running software to collect metrics in a server is considered common nowadays, it often goes unstudied the impact metric collection software have on the base system. This is especially important in low-power, IoT applications. According to our review, one particular software, Telegraf, has never been formally studied before in terms of how much overhead Telegraf adds to the base system. In this work, we conducted several experiments to study how the base system is affected by Telegraf in two scenarios: a datacenter server and an IoT node. The results show that Telegraf is lightweight and suitable to serve as a real-time monitoring agent in both scenarios.

[1]  Olga Korableva,et al.  Building the Monitoring Systems for Complex Distributed Systems: Problems and Solutions , 2017, ICEIS.

[2]  Udit Gupta,et al.  Monitoring in IOT enabled devices , 2015, ArXiv.

[3]  Konrad Slind,et al.  Monitoring distributed systems , 1987, TOCS.

[4]  Douglas C. Schmidt,et al.  R&D challenges and solutions for highly complex distributed systems: a middleware perspective , 2011, Journal of Internet Services and Applications.

[5]  S. Khattab,et al.  Resource Monitoring Algorithms Evaluation For Cloud Environment , 2013 .

[6]  Michele Colajanni,et al.  A Scalable Architecture for Real-Time Monitoring of Large Information Systems , 2012, 2012 Second Symposium on Network Cloud Computing and Applications.

[7]  Sam Newman,et al.  Building Microservices , 2015 .

[8]  Yi Ding,et al.  Web-Based Performance Monitor for Distributed Computing , 2015, 2015 IEEE Twelfth International Symposium on Autonomous Decentralized Systems.