Reliable Distributed Network Management by Replication

This paper presents a new clustering architecture for SNMP agents that supports semi-active replication of managed objects. A cluster of agents provides fault-tolerant object functionality: replicated managed objects of a crashed agent of a given cluster may be accessed through a peer cluster. The proposed architecture is structured in three layers. The lower layer corresponds to the managed objects at the network elements. The middle layer contains management entities called clusters that monitor and replicate managed objects. The upper layer allows the definition of management clusters as well as the relationship between clusters. A practical tool was implemented and is presented. The impact of replication on network performance is evaluated as well as a probabilistic analysis of replicated object consistency.

[1]  D. Tipper,et al.  Towards Fault Recovery and Management in Communication Networks , 2004, Journal of Network and Systems Management.

[2]  Jürgen Schönwälder,et al.  Definitions of Managed Objects for the Delegation of Management Scripts , 1999, RFC.

[3]  Rachid Guerraoui,et al.  Fault-Tolerance by Replication in Distributed Systems , 1996, Ada-Europe.

[4]  Glenn Mansfield,et al.  A Clustering Architecture for Replicating Managed Objects , 2002 .

[5]  Kenneth P. Birman,et al.  Building Secure and Reliable Network Applications [Book Review] , 1998, IEEE Concurrency.

[6]  J. Schoenwaelder Using Multicast-SNMP to Coordinate Distributed Management Agents , 1996 .

[7]  Kwang-Hui Lee A group communication protocol for distributed network management systems , 1996 .

[8]  William Stallings,et al.  SNMP, SNMPv2, SNMPv3, and RMON 1 and 2 , 1999 .

[9]  Gustavo Alonso,et al.  Understanding replication in databases and distributed systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[10]  Mark Garland Hayden,et al.  The Ensemble System , 1998 .

[11]  Rajiv Raghunarayan,et al.  Management Information Base for the Transmission Control Protocol (TCP) , 2005, Request for Comments.

[12]  B. J. Wilson,et al.  Network control and management of a reconfigurable WDM network , 1996, Proceedings of MILCOM '96 IEEE Military Communications Conference.

[13]  Flaviu Cristian,et al.  The Timewheel Group Membership Protocol , 1998, IPPS/SPDP Workshops.

[14]  Rachid Guerraoui,et al.  Software-Based Replication for Fault Tolerance , 1997, Computer.

[15]  Danny Dolev,et al.  Group communication as an infrastructure for distributed system management , 1996, Proceedings of Third International Workshop on Services in Distributed and Networked Environments.

[16]  E. Board Journal of Network and Systems Management , 2005, Journal of Network and Systems Management.

[17]  Bert Wijnen,et al.  An Architecture for Describing SNMP Management Frameworks , 1998, RFC.

[18]  Priya Narasimhan,et al.  A fault tolerance framework for CORBA , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[19]  Elias Procópio Duarte,et al.  Semi-active replication of SNMP objects in agent groups applied for fault management , 2001, 2001 IEEE/IFIP International Symposium on Integrated Network Management Proceedings. Integrated Network Management VII. Integrated Management Strategies for the New Millennium (Cat. No.01EX470).

[20]  Elias Procópio Duarte,et al.  A distributed network connectivity algorithm , 2003, The Sixth International Symposium on Autonomous Decentralized Systems, 2003. ISADS 2003..

[21]  Wenli Chen,et al.  ANMP: ad hoc network management protocol , 1999, IEEE J. Sel. Areas Commun..

[22]  Takashi Nanya,et al.  A Hierarachical Adaptive Distributed System-Level Diagnosis Algorithm , 1998, IEEE Trans. Computers.

[23]  Elias Procópio Duarte,et al.  Network fault management based on SNMP agent groups , 2001, Proceedings 21st International Conference on Distributed Computing Systems Workshops.

[24]  J. Schonwalder,et al.  Using multicast-SNMP to coordinate distributed management agents , 1996, Proceedings of IEEE International Workshop on System Management.

[25]  Jürgen Schönwälder,et al.  Definitions of Managed Objects for Scheduling Management Operations , 1999, RFC.

[26]  Xavier Défago,et al.  Semi-passive replication , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[27]  Chuanyi Ji,et al.  Proactive network fault detection , 1997, Proceedings of INFOCOM '97.

[28]  Robbert van Renesse,et al.  Horus: a flexible group communication system , 1996, CACM.

[29]  Ramanathan Kavasseri Event MIB , 2000, RFC.

[30]  Allan Leinwand,et al.  Network Management: A Practical Perspective , 1993 .

[31]  Jeffrey D. Case,et al.  Simple Network Management Protocol (SNMP) , 1989, RFC.

[32]  Matti A. Hiltunen,et al.  A Configurable Membership Service , 1998, IEEE Trans. Computers.

[33]  Keith McCloghrie SNMPv2 Management Information Base for the Transmission Control Protocol using SMIv2 , 1996, RFC.

[34]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.