Synoptic: Summarizing System Logs with Refinement

Distributed systems are often difficult to debug and understand. A typical way of gaining insight into system behavior is by inspecting execution logs. However, manual inspection of logs is an arduous process. To support this task we developed Synoptic. Synoptic outputs a concise graph representation of logged events that captures temporal invariants mined from the log. We applied Synoptic to synthetic and real distributed system logs and found that it augmented a distributed system designer's understanding of system behavior with reasonable overhead for an offline analysis tool. In contrast to prior approaches, Synoptic uses a combination of refinement and coarsening to explore the space of representations. Additionally, it infers temporal event invariants to capture distributed system semantics. These invariants drive the exploration process and are satisfied by the final representation.

[1]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[2]  Edmund M. Clarke,et al.  Counterexample-guided abstraction refinement , 2003, 10th International Symposium on Temporal Representation and Reasoning, 2003 and Fourth International Conference on Temporal Logic. Proceedings..

[3]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[4]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[5]  Rajeev Gandhi,et al.  SALSA: Analyzing Logs as StAte Machines (CMU-PDL-08-111) , 2008 .

[6]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[7]  Viktor Kuncak,et al.  CrystalBall: Predicting and Preventing Inconsistencies in Deployed Distributed Systems , 2009, NSDI.

[8]  Haikady N. Nagaraja,et al.  Inference in Hidden Markov Models , 2006, Technometrics.

[9]  John Domingue,et al.  Magpie: supporting browsing and navigation on the semantic web , 2004, IUI '04.

[10]  Haifeng Chen,et al.  Multi-resolution Abnormal Trace Detection Using Varied-length N-grams and Automata , 2005, ICAC.

[11]  Eric Moulines,et al.  Inference in Hidden Markov Models (Springer Series in Statistics) , 2005 .

[12]  George S. Avrunin,et al.  Patterns in property specifications for finite-state verification , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[13]  David Evans,et al.  Automatically inferring temporal properties for program evolution , 2004, 15th International Symposium on Software Reliability Engineering.

[14]  David Evans,et al.  Dynamically inferring temporal properties , 2004, PASTE.

[15]  Davide Sangiorgi,et al.  On the origins of bisimulation and coinduction , 2009, TOPL.

[16]  Rajeev Gandhi,et al.  Mochi: Visual Log-Analysis Based Tools for Debugging Hadoop , 2009, HotCloud.

[17]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[18]  Dana Angluin,et al.  Finding Patterns Common to a Set of Strings , 1980, J. Comput. Syst. Sci..

[19]  Kathi Fisler,et al.  Bisimulation Minimization and Symbolic Model Checking , 2002, Formal Methods Syst. Des..

[20]  David Lo,et al.  Automatic steering of behavioral model inference , 2009, ESEC/SIGSOFT FSE.

[21]  Randy H. Katz,et al.  X-Trace: A Pervasive Network Tracing Framework , 2007, NSDI.

[22]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[23]  Edmund M. Clarke,et al.  Counterexample-Guided Abstraction Refinement , 2000, CAV.

[24]  Jerome A. Feldman,et al.  On the Synthesis of Finite-State Machines from Samples of Their Behavior , 1972, IEEE Transactions on Computers.

[25]  Peter Radford,et al.  Petri Net Theory and the Modeling of Systems , 1982 .

[26]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[27]  Leonard Pitt,et al.  The minimum consistent DFA problem cannot be approximated within any polynomial , 1993, JACM.

[28]  Dimitra Giannakopoulou,et al.  From States to Transitions: Improving Translation of LTL Formulae to Büchi Automata , 2002, FORTE.

[29]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[30]  Ding Yuan,et al.  SherLog: error diagnosis by connecting clues from run-time logs , 2010, ASPLOS XV.

[31]  Y VardiMoshe,et al.  Bisimulation Minimization and Symbolic Model Checking , 2002 .

[32]  Haoxiang Lin,et al.  MODIST: Transparent Model Checking of Unmodified Distributed Systems , 2009, NSDI.

[33]  Qiang Fu,et al.  Mining dependency in distributed systems through unstructured logs analysis , 2010, OPSR.

[34]  Manuel Blum,et al.  Toward a Mathematical Theory of Inductive Inference , 1975, Inf. Control..

[35]  Leonardo Mariani,et al.  Automatic generation of software behavioral models , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[36]  Robert E. Tarjan,et al.  Three Partition Refinement Algorithms , 1987, SIAM J. Comput..

[37]  Gary L. Peterson,et al.  An O(nlog n) Unidirectional Algorithm for the Circular Extrema Problem , 1982, TOPL.

[38]  Thomas E. Anderson,et al.  Reverse traceroute , 2010, NSDI.

[39]  Tapio Elomaa Partition-Refining Algorithms for Learning Finite State Automata , 2002, ISMIS.

[40]  Rajeev Gandhi,et al.  SALSA: Analyzing Logs as StAte Machines , 2008, WASL.