Modeling and Verifying Producer-Consumer Communication in Kafka Using CSP

Apache Kafka is a popular open source and commercially supported distributed publish-subscribe messaging system, which achieves high throughput, low latency and good load balancing. As a widely used messaging system, message delivery between producers and consumers is one of core functions in Kafka, and the reliability and correctness of message delivery are important and concerned. The formal methods can analyze whether it is a highly credible model. Therefore, it is significant to analyze the communication between producers and consumers from the perspective of formal methods. In this paper, we model the communication between producers and consumers in Kafka by the process algebra CSP (Communicating Sequential Processes). Moreover, we also apply the model checking tool PAT (Process Analysis Toolkit) to verify five properties of our system namely Deadlock Freedom, Acknowledgement Mechanism, Parallelism, Sequentiality and Fault Tolerance. The results of verification show the model of message transmission in Kafka messaging system caters for its specification, from which this system can be concluded that it is reliable.

[1]  Lei Zhang,et al.  Formal analysis and verification of the PSTM architecture using CSP , 2020, J. Syst. Softw..

[2]  T. Rabl,et al.  How Fast Can We Insert? A Performance Study of Apache Kafka , 2020, ArXiv.

[3]  T Sharvari,et al.  A study on Modern Messaging Systems- Kafka, RabbitMQ and NATS Streaming , 2019, ArXiv.

[4]  Sean Rooney,et al.  Kafka: the Database Inverted, but Not Garbled or Compromised , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[5]  Han Wu,et al.  TRAK: A Testing Tool for Studying the Reliability of Data Delivery in Apache Kafka , 2019, 2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW).

[6]  Gorazd Kandus,et al.  Secure Hybrid Publish-Subscribe Messaging Architecture , 2019, 2019 International Conference on Software, Telecommunications and Computer Networks (SoftCOM).

[7]  Katinka Wolter,et al.  Performance Prediction for the Apache Kafka Messaging System , 2019, 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[8]  Ajeet K. Jain,et al.  A Novel Approach to Extend KM Models with Object Knowledge Model (OKM) and Kafka for Big Data and Semantic Web with Greater Semantics , 2019, CISIS.

[9]  Huibiao Zhu,et al.  Modeling and Verifying NDN Access Control Using CSP , 2018, ICFEM.

[10]  Huibiao Zhu,et al.  Formalization and Verification of the OpenFlow Bundle Mechanism Using CSP , 2018, SEKE.

[11]  Philippe Dobbelaere,et al.  Kafka versus RabbitMQ: A comparative study of two industry reference publish/subscribe implementations: Industry Paper , 2017, DEBS.

[12]  Jun Rao,et al.  Building a Replicated Logging System with Apache Kafka , 2015, Proc. VLDB Endow..

[13]  Jun Sun,et al.  Model checking with fairness assumptions using PAT , 2014, Frontiers of Computer Science.

[14]  José Meseguer,et al.  Formal Analysis of Fault-tolerant Group Key Management Using ZooKeeper , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[15]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[16]  Jun Sun,et al.  Model Checking CSP Revisited: Introducing a Process Analysis Toolkit , 2008, ISoLA.

[17]  Anne-Marie Kermarrec,et al.  The many faces of publish/subscribe , 2003, CSUR.

[18]  A. W. Roscoe,et al.  Using CSP to Detect Errors in the TMN Protocol , 1997, IEEE Trans. Software Eng..

[19]  C. A. R. Hoare,et al.  A Theory of Communicating Sequential Processes , 1984, JACM.