Self checking network protocols: a monitor based approach

The wide deployment of high-speed computer networks has made distributed systems ubiquitous in today's connected world. The machines on which the distributed applications are hosted are heterogeneous in nature, the applications often run legacy code without the availability of their source code, the systems are of very large scales, and often have soft real-time guarantees. In this paper, we target the problem of online detection of disruptions through a generic external entity called Monitor that is able to observe the exchanged messages between the protocol participants and deduce any ongoing disruption by matching against a rule base composed of combinatorial and temporal rules. The Monitor architecture is application neutral, with the rule base making it specific to a protocol. To make the detection infrastructure scalable and dependable, we extend it to a hierarchical Monitor structure. The infrastructure is applied to a streaming video application running on a reliable multicast protocol called TRAM installed on the campus wide network. The evaluation brings out the scalability of the monitor infrastructure and detection coverage under different kinds of faults for the single level and the hierarchical arrangements.

[1]  Sanjoy Paul,et al.  RMTP: a reliable multicast transport protocol , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[2]  Anjali Agarwal,et al.  A unified approach to fault-tolerance in communication protocols based on recovery procedures , 1996, TNET.

[3]  Saurabh Bagchi,et al.  Design and Evaluation of Preemptive Control Signature Checking for Distributed Applications , 2000 .

[4]  Dah Ming Chiu,et al.  TRAM: A Tree-based Reliable Multicast Protocol , 1998 .

[5]  Henrique Madeira,et al.  Experimental evaluation of the fail-silent behavior in computers without error masking , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[6]  ZHANGLi-xia,et al.  A reliable multicast framework for light-weight sessions and application level framing , 1995 .

[7]  Leslie Lamport,et al.  Artificial Intelligence and Language Processing ]acques Cohen Editor a Simple Approach to Specifying Concurrent Systems , 2022 .

[8]  M. Diaz,et al.  Modeling and Verification of Time Dependent Systems Using Time Petri Nets , 1991, IEEE Trans. Software Eng..

[9]  Madhu Sudan,et al.  A reliable dissemination protocol for interactive collaborative applications , 1995, MULTIMEDIA '95.

[10]  Gregor von Bochmann A General Transition Model for Protocols and Communication Services , 1980, IEEE Trans. Commun..

[11]  Matti A. Hiltunen Membership and system diagnosis , 1995, Proceedings. 14th Symposium on Reliable Distributed Systems.

[12]  Henning Schulzrinne,et al.  Application-layer mobility using SIP , 2000, MOCO.

[13]  Guangtian Liu,et al.  Early detection of timing constraint violation at runtime , 1997, Proceedings Real-Time Systems Symposium.

[14]  Mohammad Zulkernine,et al.  A compositional approach to monitoring distributed systems , 2002, Proceedings International Conference on Dependable Systems and Networks.

[15]  Michel Raynal,et al.  From crash fault-tolerance to arbitrary-fault tolerance: towards a modular approach , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[16]  Pascal Lorenz,et al.  A state-machine for temporal qualification of time-critical communication , 1994, Proceedings of 26th Southeastern Symposium on System Theory.

[17]  Grigore Rosu,et al.  Monitoring Java Programs with Java PathExplorer , 2001, RV@CAV.

[18]  Gernot Metze,et al.  Design of Totally Self-Checking Check Circuits for m-Out-of-n Codes , 1973, IEEE Transactions on Computers.

[19]  Mischa Schwartz,et al.  Schemes for fault identification in communication networks , 1995, TNET.

[20]  Zohar Manna,et al.  A temporal proof methodology for reactive systems , 1990, Proceedings of the 5th Jerusalem Conference on Information Technology, 1990. 'Next Decade in Information Technology'.

[21]  Ravishankar K. Iyer,et al.  An Experimental Study of Memory Fault Latency , 1989, IEEE Trans. Computers.

[22]  Hermann Kopetz,et al.  Tolerating transient faults in MARS , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[23]  Dah-Ming Chiu,et al.  A congestion control algorithm for tree-based reliable multicast protocols , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[24]  Jonathan S. Ostroff,et al.  Deciding Properties of Timed Transition Models , 1990, IEEE Trans. Parallel Distributed Syst..

[25]  Gregor von Bochmann,et al.  Formal Methods in Communication Protocol Design , 1980, IEEE Trans. Commun..

[26]  Suku Nair,et al.  A hierarchical object-oriented approach to fault tolerance in distributed systems , 1993, Proceedings of 1993 IEEE International Symposium on Software Reliability Engineering.

[27]  Charles E. Stroud,et al.  Delay fault testability modeling with temporal logic , 1997, 1997 IEEE Autotestcon Proceedings AUTOTESTCON '97. IEEE Systems Readiness Technology Conference. Systems Readiness Supporting Global Needs and Awareness in the 21st Century.

[28]  Peter Loshin Big book IP telephony RFCs , 2001 .

[29]  Saurabh Bagchi,et al.  Failure handling in a reliable multicast protocol for improving buffer utilization and accommodating heterogeneous receivers , 2004, 10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings..

[30]  Farnam Jahanian,et al.  A formalism for monitoring real-time constraints at run-time , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[31]  W. Damm,et al.  Specification and verification of system-level hardware designs using time diagrams , 1993, 1993 European Conference on Design Automation with the European Event in ASIC Design.

[32]  Guy Juanole,et al.  Observer-A Concept for Formal On-Line Validation of Distributed Systems , 1994, IEEE Trans. Software Eng..

[33]  Saurabh Bagchi,et al.  Automated online monitoring of distributed applications through external monitors , 2006, IEEE Transactions on Dependable and Secure Computing.

[34]  Saurabh Bagchi,et al.  Exactly-once delivery in a content-based publish-subscribe system , 2002, Proceedings International Conference on Dependable Systems and Networks.

[35]  Ellen W. Zegura,et al.  On the use of destination set grouping to improve inter-receiver fairness for multicast ABR sessions , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[36]  Hans Eriksson,et al.  MBONE: the multicast backbone , 1994, CACM.

[37]  Michel Diaz,et al.  Unified Design of Self-Checking and Fail-Safe Combinational Circuits and Sequential Machines , 1979, IEEE Transactions on Computers.

[38]  Leslie Lamport,et al.  The temporal logic of actions , 1994, TOPL.

[39]  Of references. , 1966, JAMA.

[40]  Dah-Ming Chiu,et al.  Pruning algorithms for multicast flow control , 2000, COMM '00.

[41]  Marcos K. Aguilera,et al.  On the quality of service of failure detectors , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[42]  J. Karlsson,et al.  Application of Three Physical Fault Injection Techniques to the Experimental Assessment of the MARS Architecture , 1995 .

[43]  Mathai Joseph,et al.  Specification and verification of fault-tolerance, timing, and scheduling , 1999, TOPL.

[44]  Wuxu Peng Deadlock detection in communicating finite state machines by even reachability analysis , 1997, Mob. Networks Appl..

[45]  I-Chen Wu,et al.  Detection of summative global predicates , 1997, Proceedings 1997 International Conference on Parallel and Distributed Systems.

[46]  A. Danthine,et al.  Protocol Representation with Finite-State Models , 1980, IEEE Trans. Commun..

[47]  Emmerich Fuchs An Evaluation of the Error Detection Mechanisms in MARS Using Software-Implemented Fault Injection , 1996, EDCC.

[48]  Mahesh Viswanathan,et al.  Java-MaC: A Run-Time Assurance Approach for Java Programs , 2004, Formal Methods Syst. Des..