Sequence Pre-processing : Focusing Analysis of Log Event Data

Many computational systems are generating log event data as a way to help developers understand the usage of applications in the wild. While many commercial analysis tools exist, they tend to treat log event data as a “bag of events” instead of collections of observed sequences, where each sequence represents an individual session. While recent work can support the visual analysis of event sequence data, log files tend to contain complexity in scale and noise that can foul downstream analyses. In this work, we identify common recurring problems of noise that arise from the analysis of this data, and assert that methods for preprocessing can be a valuable tool to both focus data for downstream analysis and provide provenance support for visual analytics tools. These pre-processing methods can be performed interactively and in conjunction with analysis tools to iteratively refine rules to streamline visual analysis. Through several case studies, we identify the common sources of noise in log files and demonstrate how our proposed pre-processing methods can help to minimize excess data reaching downstream analysis tools.

[1]  Narain H. Gehani,et al.  Composite Event Specification in Active Databases: Model & Implementation , 1992, VLDB.

[2]  Paul Hudak,et al.  Event-Driven FRP , 2002, PADL.

[3]  Yanlei Diao,et al.  High-performance complex event processing over streams , 2006, SIGMOD Conference.

[4]  Jeffrey Heer,et al.  Graphical Histories for Visualization: Supporting Analysis, Communication, and Evaluation , 2008, IEEE Transactions on Visualization and Computer Graphics.

[5]  Ben Shneiderman,et al.  Interactive Information Visualization to Explore and Query Electronic Health Records , 2013, Found. Trends Hum. Comput. Interact..

[6]  Ben Shneiderman,et al.  The challenges of specifying intervals and absences in temporal queries: a graphical language approach , 2013, CHI.

[7]  Ben Shneiderman,et al.  Temporal Event Sequence Simplification , 2013, IEEE Transactions on Visualization and Computer Graphics.

[8]  Eser Kandogan,et al.  From Data to Insight: Work Practices of Analysts in the Enterprise , 2014, IEEE Computer Graphics and Applications.

[9]  David Gotz,et al.  DecisionFlow: Visual Analytics for High-Dimensional Temporal Event Sequence Data , 2014, IEEE Transactions on Visualization and Computer Graphics.

[10]  Badrish Chandramouli,et al.  Trill: A High-Performance Incremental Query Processor for Diverse Analytics , 2014, Proc. VLDB Endow..

[11]  Richard Szeliski,et al.  First-person hyper-lapse videos , 2014, ACM Trans. Graph..

[12]  Emanuel Zgraggen,et al.  (s|qu)eries: Visual Regular Expressions for Querying and Exploring Event Sequences , 2015, CHI.

[13]  Ben Shneiderman,et al.  Cohort Comparison of Event Sequences with Balanced Integration of Visual Analytics and Statistics , 2015, IUI.

[14]  Ben Shneiderman,et al.  Sharpening Analytic Focus to Cope with Big Data Volume and Variety , 2015, IEEE Computer Graphics and Applications.

[15]  Steven M. Drucker,et al.  The Bones of the System: A Case Study of Logging and Telemetry at Microsoft , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).