Automated Rule-Based Diagnosis Through a Distributed Monitor System
暂无分享,去创建一个
Saurabh Bagchi | Paulo Veríssimo | Miguel Correia | Gunjan Khanna | Mike Yu Cheng | Padma Varadharajan | P. Veríssimo | S. Bagchi | M. Correia | G. Khanna | Padma Varadharajan | M. Cheng
[1] Richard Mortier,et al. Magpie: Online Modelling and Performance-aware Systems , 2003, HotOS.
[2] Marcos K. Aguilera,et al. Performance debugging for distributed systems of black boxes , 2003, SOSP '03.
[3] Miguel Correia,et al. Low complexity Byzantine-resilient consensus , 2005, Distributed Computing.
[4] Miroslaw Malek,et al. The consensus problem in fault-tolerant computing , 1993, CSUR.
[5] Friedemann Mattern,et al. Detecting causal relationships in distributed computations: In search of the holy grail , 1994, Distributed Computing.
[6] Saurabh Bagchi,et al. Dependency Analysis in Distributed Systems using Fault Injection: Application to Problem Determination in an e-commerce Environment , 2001, DSOM.
[7] Henrique Madeira,et al. Experimental evaluation of the fail-silent behavior in computers without error masking , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.
[8] Kenneth L. McMillan,et al. Symbolic model checking , 1992 .
[9] Joseph L. Hellerstein. GAP: A General Approach to Quantitative Diagnosis of Performance Problems , 2004, Journal of Network and Systems Management.
[10] A. Jefferson Offutt,et al. Generating Tests from UML Specifications , 1999, UML.
[11] Kang G. Shin,et al. On Probabilistic Diagnosis of Multiprocessor Systems Using Multiple Syndromes , 1994, IEEE Trans. Parallel Distributed Syst..
[12] S. Louis Hakimi,et al. On Models for Diagnosable Systems and Probabilistic Fault Diagnosis , 1976, IEEE Transactions on Computers.
[13] Guy Juanole,et al. Observer-A Concept for Formal On-Line Validation of Distributed Systems , 1994, IEEE Trans. Software Eng..
[14] S. Louis Hakimi,et al. An optimal algorithm for distributed system level diagnosis , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.
[15] Takashi Nanya,et al. A Hierarachical Adaptive Distributed System-Level Diagnosis Algorithm , 1998, IEEE Trans. Computers.
[16] A. Avizienis,et al. Dependable computing: From concepts to design diversity , 1986, Proceedings of the IEEE.
[17] Mischa Schwartz,et al. Schemes for fault identification in communication networks , 1995, TNET.
[18] Robert K. Brayton,et al. Partial-Order Reduction in Symbolic State Space Exploration , 1997, CAV.
[19] Achour Mostéfaoui,et al. Crash-resilient time-free eventual leadership , 2004, Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004..
[20] Aaron B. Brown,et al. An active approach to characterizing dynamic dependencies for problem determination in a distributed environment , 2001, 2001 IEEE/IFIP International Symposium on Integrated Network Management Proceedings. Integrated Network Management VII. Integrated Management Strategies for the New Millennium (Cat. No.01EX470).
[21] Ravishankar K. Iyer,et al. Measurement-Based Analysis of Error Latency , 1987, IEEE Transactions on Computers.
[22] Kavita Ravi,et al. High-density reachability analysis , 1995, ICCAD.
[23] Sam Toueg,et al. Unreliable failure detectors for reliable distributed systems , 1996, JACM.
[24] Edmund M. Clarke,et al. Representing circuits more efficiently in symbolic model checking , 1991, 28th ACM/IEEE Design Automation Conference.
[25] Kenneth L. McMillan,et al. Symbolic model checking: an approach to the state explosion problem , 1992 .
[26] Saurabh Bagchi,et al. Failure handling in a reliable multicast protocol for improving buffer utilization and accommodating heterogeneous receivers , 2004, 10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings..
[27] Peter M. Chen,et al. How fail-stop are faulty programs? , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).
[28] Miguel Castro,et al. Proactive recovery in a Byzantine-fault-tolerant system , 2000, OSDI.
[29] Dah-Ming Chiu,et al. A congestion control algorithm for tree-based reliable multicast protocols , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.
[30] Kang G. Shin,et al. Optimal and Efficient Probabilistic Distributed Diagnosis Schemes , 1993, IEEE Trans. Computers.
[31] Robbert van Renesse,et al. Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining , 2003, TOCS.
[32] Ozalp Babaoglu,et al. Consistent global states of distributed systems: fundamental concepts and mechanisms , 1993 .
[33] Ravishankar K. Iyer,et al. A framework for database audit and control flow checking for a wireless telephone network controller , 2001, 2001 International Conference on Dependable Systems and Networks.
[34] Leslie Lamport,et al. Time, clocks, and the ordering of events in a distributed system , 1978, CACM.
[35] Christophe Meudec,et al. Automatic generation of software test cases from formal specifications , 1998 .
[36] Mohammad Zulkernine,et al. A compositional approach to monitoring distributed systems , 2002, Proceedings International Conference on Dependable Systems and Networks.
[37] MatternFriedemann,et al. Detecting causal relationships in distributed computations , 1994 .
[38] Richard W. Buskens,et al. Distributed on-line diagnosis in the presence of arbitrary faults , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.
[39] GERNOT METZE,et al. On the Connection Assignment Problem of Diagnosable Systems , 1967, IEEE Trans. Electron. Comput..
[40] Samuel T. King,et al. Backtracking intrusions , 2003, SOSP '03.
[41] Edmund M. Clarke,et al. Symbolic Model Checking with Partitioned Transistion Relations , 1991, VLSI.
[42] L. Alvisi,et al. A Survey of Rollback-Recovery Protocols , 2002 .
[43] Saurabh Bagchi,et al. Self checking network protocols: a monitor based approach , 2004, Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004..
[44] Sam Toueg,et al. Asynchronous consensus and broadcast protocols , 1985, JACM.
[45] Miguel Correia,et al. The Design of a COTSReal-Time Distributed Security Kernel , 2002, EDCC.
[46] Miguel Correia,et al. How to tolerate half less one Byzantine nodes in practical distributed systems , 2004, Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004..
[47] Sampath Rangarajan,et al. Probabilistic diagnosis of multiprocessor systems with arbitrary connectivity , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.