Tuning Logstash Garbage Collection for High Throughput in a Monitoring Platform

The collection and aggregation of monitoring data from distributed applications are an extremely important topic. The scale of these applications, such as those designed for Big Data, makes the performance of the services responsible for parsing and aggregating logs a key issue. Logstash is a well-known open source framework for centralizing and parsing both structured and unstructured monitoring data. As with many parsing applications, throttling is a common issue due to the incoming data exceeding Logstash processing ability. The conventional approach for improving performance usually entails increasing the number of workers as well as the buffer size. However, it is unknown whether these approaches might tackle the issue when scaling to thousands of nodes. In this paper, by profiling Java virtual machine, we optimize Garbage Collection in order to fine tune a Logstash instance in DICE monitoring platform to increase its throughput. A Logstash shipper simulation tool was developed to transfer simulated data to the Logstash instance. It is capable of simulating thousands of monitored nodes. The obtained results show that with our suggestion of minimizing Garbage Collection impact, the Logtash throughput increases considerably.

[1]  Nishant Garg Apache Kafka , 2013 .

[2]  Witawas Srisa-an,et al.  Investigating the effects of using different nursery sizing policies on performance , 2009, ISMM '09.

[3]  Hanspeter Mössenböck,et al.  The taming of the shrew: increasing performance by automatic parameter tuning for java garbage collectors , 2014, ICPE.

[4]  Dana Petcu,et al.  DICE: Quality-Driven Development of Data-Intensive Cloud Applications , 2015, 2015 IEEE/ACM 7th International Workshop on Modeling in Software Engineering.

[5]  Perry Cheng,et al.  Myths and realities: the performance impact of garbage collection , 2004, SIGMETRICS '04/Performance '04.

[6]  Tim Brecht,et al.  Controlling garbage collection and heap growth to reduce the execution time of Java applications , 2006, TOPL.

[7]  Shrinivas B. Joshi,et al.  Apache hadoop performance-tuning methodologies and best practices , 2012, ICPE '12.

[8]  Ioan Dragan,et al.  An Overview of Monitoring Tools for Big Data and Cloud Applications , 2015, 2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC).

[9]  David Detlefs,et al.  Garbage-first garbage collection , 2004, ISMM '04.

[10]  Francisco Tirado,et al.  Dynamic management of nursery space organization in generational collection , 2004, Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004..

[11]  Witawas Srisa-an,et al.  Investigating throughput degradation behavior of Java application servers: a view from inside a virtual machine , 2006, PPPJ '06.

[12]  Marcus B. Perry,et al.  The Exponentially Weighted Moving Average , 2010 .

[13]  Otis Gospodnetic,et al.  Lucene in Action , 2004 .

[14]  Gavin Brown,et al.  Garbage collection auto-tuning for Java mapreduce on multi-cores , 2011, ISMM '11.

[15]  Josiah L. Carlson,et al.  Redis in Action , 2013 .

[16]  Otis Gospodnetic,et al.  Lucene in Action, Second Edition: Covers Apache Lucene 3.0 , 2010 .