Fault localization in service-based systems hosted in mobile ad hoc networks

Fault localization in general refers to a technique for identifying the likely root causes of failures observed in systems formed from components. Fault localization in systems deployed on mobile ad hoc networks (MANETs) is a particularly challenging task because those systems are subject to a wider variety and higher incidence of faults than those deployed in fixed networks, the resources available to track fault symptoms are severely limited, and many of the sources of faults in MANETs are by their nature transient. We present a suite of three methods, each responsible for part of the overall task of localizing the faults occurring in service-based systems hosted on MANETs. First, we describe a dependence discovery method, designed specifically for this environment, yielding dynamic snapshots of dependence relationships discovered through decentralized observations of service interactions. Next, we present a method for localizing the faults occurring in service-based systems hosted on MANETs. We employ both Bayesian and timing-based reasoning techniques to analyze the dependence data produced by the dependence discovery method in the context of a specific fault propagation model, deriving a ranked list of candidate fault locations. In the third method, we present an epidemic protocol designed for transferring the dependence and symptom data between nodes of MANET networks with low connectivity. The protocol creates network wide synchronization overlay and transfers the data over intermediate nodes in periodic synchronization cycles. We introduce a new tool for simulation of service-based systems hosted on MANETs and use the tool for evaluation of several operational aspects of the methods. Next, we present implementation of the methods in Java EE and use emulation environment to evaluate the methods. We present the results of an extensive set of experiments exploring a wide range of operational conditions to evaluate the accuracy and performance of our methods.

[1]  T.R. Henderson,et al.  CORE: A real-time network emulator , 2008, MILCOM 2008 - 2008 IEEE Military Communications Conference.

[2]  Spyros G. Denazis,et al.  Dependency Detection Using a Fuzzy Engine , 2007, DSOM.

[3]  Le Gruenwald,et al.  A survey of data replication techniques for mobile ad hoc network databases , 2008, The VLDB Journal.

[4]  Qi Han,et al.  Journal of Network and Systems Management ( c ○ 2007) DOI: 10.1007/s10922-007-9062-0 A Survey of Fault Management in Wireless Sensor Networks , 2022 .

[5]  Ravi Prakash,et al.  Information dissemination in partitionable mobile ad hoc networks , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.

[6]  Richard Mortier,et al.  Constellation: automated discovery of service and host dependencies in networked systems , 2008 .

[7]  Malgorzata Steinder,et al.  A survey of fault localization techniques in computer networks , 2004, Sci. Comput. Program..

[8]  Qiang Fu,et al.  Mining dependency in distributed systems through unstructured logs analysis , 2010, OPSR.

[9]  Fabio Casati,et al.  Toward Web Service Dependency Discovery for SOA Management , 2008, 2008 IEEE International Conference on Services Computing.

[10]  David Heckerman,et al.  A Tractable Inference Algorithm for Diagnosing Multiple Diseases , 2013, UAI.

[11]  Gerd Kortuem,et al.  When peer-to-peer comes face-to-face: collaborative peer-to-peer computing in mobile ad-hoc networks , 2001, Proceedings First International Conference on Peer-to-Peer Computing.

[12]  Takahiro Hara,et al.  Replica allocation for correlated data items in ad hoc sensor networks , 2004, SGMD.

[13]  Richard Mortier,et al.  Using Magpie for Request Extraction and Workload Modelling , 2004, OSDI.

[14]  Baochun Li,et al.  Efficient and guaranteed service coverage in partitionable mobile ad-hoc networks , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[15]  Jaideep Chandrashekar,et al.  Macroscope: end-point approach to networked application dependency discovery , 2009, CoNEXT '09.

[16]  Xu Chen,et al.  Automating Network Application Dependency Discovery: Experiences, Limitations, and New Solutions , 2008, OSDI.

[17]  Michael D. Ernst Static and dynamic analysis: synergy and duality , 2003 .

[18]  Sisi Liu,et al.  Gateway selection in hybrid wireless networks through cooperative probing , 2013, 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013).

[19]  Viktor K. Prasanna,et al.  Energy-latency tradeoffs for data gathering in wireless sensor networks , 2004, IEEE INFOCOM 2004.

[20]  Oliver P. Waldhorst,et al.  A special-purpose peer-to-peer file sharing system for mobile ad hoc networks , 2003, 2003 IEEE 58th Vehicular Technology Conference. VTC 2003-Fall (IEEE Cat. No.03CH37484).

[21]  M. Natu,et al.  Adaptive fault localization in mobile ad hoc battlefield networks , 2005, MILCOM 2005 - 2005 IEEE Military Communications Conference.

[22]  Salvatore J. Stolfo,et al.  A coding approach to event correlation , 1995, Integrated Network Management.

[23]  Mischa Schwartz,et al.  Schemes for fault identification in communication networks , 1995, TNET.

[24]  Andy Zaidman,et al.  A framework-based runtime monitoring approach for service-oriented software systems , 2011, QASBA '11.

[25]  Mariusz A. Fecko,et al.  Combinatorial designs in multiple faults localization for battlefield networks , 2001, 2001 MILCOM Proceedings Communications for Network-Centric Operations: Creating the Information Force (Cat. No.01CH37277).

[26]  Seraphin B. Calo,et al.  Alarm correlation and fault identification in communication networks , 1994, IEEE Trans. Commun..

[27]  I.K. Eltahir The Impact of Different Radio Propagation Models for Mobile Ad hoc NETworks (MANET) in Urban Area Environment , 2007, The 2nd International Conference on Wireless Broadband and Ultra Wideband Communications (AusWireless 2007).

[28]  Miriam A. M. Capretz,et al.  A Dependency Impact Analysis Model for Web Services Evolution , 2009, 2009 IEEE International Conference on Web Services.

[29]  Takahiro Hara,et al.  Replica Allocation Methods in Ad Hoc Networks with Data Update , 2003, Mob. Networks Appl..

[30]  Klaus Wehrle,et al.  Adapting distributed hash tables for mobile ad hoc networks , 2006, Fourth Annual IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOMW'06).

[31]  Jeff Ahrenholz Comparison of CORE network emulation platforms , 2010, 2010 - MILCOM 2010 MILITARY COMMUNICATIONS CONFERENCE.

[32]  Ming Luo,et al.  Network synchronization for distributed MANET , 2008, MILCOM 2008 - 2008 IEEE Military Communications Conference.

[33]  Aekyung Moon,et al.  ENERGY-EFFICIENT REPLICATION EXTENDED DATABASE STATE MACHINE IN MOBILE AD-HOC NETWORK , 2004 .

[34]  Bruno Vidalenc,et al.  Towards a Unified Architecture for Resilience, Survivability and Autonomic Fault-Management for Self-managing Networks , 2009, ICSOC/ServiceWave Workshops.

[35]  Azzedine Boukerche,et al.  A distributed fault identification protocol for wireless and mobile ad hoc networks , 2008, J. Parallel Distributed Comput..

[36]  Antonio Alfredo Ferreira Loureiro,et al.  Evaluation of peer-to-peer network content discovery techniques over mobile ad hoc networks , 2005, Sixth IEEE International Symposium on a World of Wireless Mobile and Multimedia Networks.

[37]  Malgorzata Steinder,et al.  End-to-end service failure diagnosis using belief networks , 2002, NOMS 2002. IEEE/IFIP Network Operations and Management Symposium. ' Management Solutions for the New Communications World'(Cat. No.02CH37327).

[38]  Todd Andrew Stephenson,et al.  An Introduction to Bayesian Network Theory and Usage , 2000 .

[39]  Paramvir Bahl,et al.  Troubleshooting wireless mesh networks , 2006, CCRV.

[40]  Bhavani M. Thuraisingham,et al.  WS-Sim: A Web Service Simulation Toolset with Realistic Data Support , 2010, 2010 IEEE 34th Annual Computer Software and Applications Conference Workshops.

[41]  Paramvir Bahl,et al.  Towards highly reliable enterprise network services via inference of multi-level dependencies , 2007, SIGCOMM '07.

[42]  Ming-Syan Chen,et al.  Exploring group mobility for replica data allocation in a mobile environment , 2003, CIKM '03.

[43]  Malgorzata Steinder,et al.  Probabilistic fault diagnosis in communication systems through incremental hypothesis updating , 2004, Comput. Networks.

[44]  Ken Birman,et al.  The promise, and limitations, of gossip protocols , 2007, OPSR.

[45]  Brian Adamson,et al.  Integration of the CORE and EMANE Network Emulators , 2011, 2011 - MILCOM 2011 Military Communications Conference.

[46]  Takahiro Hara,et al.  Effective replica allocation in ad hoc networks for improving data accessibility , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[47]  Paolo Bellavista,et al.  Comparing and evaluating lightweight solutions for replica dissemination and retrieval in dense MANETs , 2005, 10th IEEE Symposium on Computers and Communications (ISCC'05).

[48]  Paolo Bellavista,et al.  REDMAN: a decentralized middleware solution for cooperative replication in dense MANETs , 2005, Third IEEE International Conference on Pervasive Computing and Communications Workshops.

[49]  Robert Nowak,et al.  Distributed optimization in sensor networks , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[50]  Dong Zhou,et al.  An Accurate and Scalable Clock Synchronization Protocol for IEEE 802.11-Based Multihop Ad Hoc Networks , 2007, IEEE Transactions on Parallel and Distributed Systems.

[51]  Paramvir Bahl,et al.  Architecture and techniques for diagnosing faults in IEEE 802.11 infrastructure networks , 2004, MobiCom '04.

[53]  David A. Patterson,et al.  Path-Based Failure and Evolution Management , 2004, NSDI.

[54]  Stefano Chessa,et al.  Comparison-based system-level fault diagnosis in ad hoc networks , 2001, Proceedings 20th IEEE Symposium on Reliable Distributed Systems.

[55]  Soummya Kar,et al.  Gossip Algorithms for Distributed Signal Processing , 2010, Proceedings of the IEEE.

[56]  Jirí Vomlel,et al.  Exploiting Functional Dependence in Bayesian Network Inference , 2002, UAI.

[57]  Esther Pacitti,et al.  Fast Algorithms for Maintaining Replica Consistency in Lazy Master Replicated Databases , 1999, VLDB.

[58]  Ananthram Swami,et al.  Diagnosing degradation of services in hybrid wireless tactical networks , 2013, Defense, Security, and Sensing.

[59]  Azzedine Boukerche,et al.  Diagnosing mobile ad-hoc networks: two distributed comparison-based self-diagnosis protocols , 2006, MobiWac '06.

[60]  Maitreya Natu,et al.  Using temporal correlation for fault localization in dynamically changing networks , 2008, Int. J. Netw. Manag..

[61]  David Simplot-Ryl,et al.  Replication decision algorithm based on link evaluation for services in MANET , 2002 .

[62]  Chen-Hua Shih,et al.  A cross-layer approach for real-time multimedia streaming on wireless peer-to-peer ad hoc network , 2013, Ad Hoc Networks.

[63]  Paramvir Bahl,et al.  Discovering Dependencies for Network Management , 2006, HotNets.

[64]  Monika Grajzer,et al.  Fault Propagation Model for Ad Hoc Networks , 2011, 2011 IEEE International Conference on Communications (ICC).

[65]  Bharat K. Bhargava,et al.  Peer-to-peer file-sharing over mobile ad hoc networks , 2004, IEEE Annual Conference on Pervasive Computing and Communications Workshops, 2004. Proceedings of the Second.

[66]  Boleslaw K. Szymanski,et al.  Dynamic Composition of Services in Sensor Networks , 2010, 2010 IEEE International Conference on Services Computing.

[67]  Patrick Th. Eugster,et al.  PAN: providing reliable storage in mobile ad hoc networks with probabilistic quorum systems , 2003, MobiHoc '03.