MithriLog: Near-Storage Accelerator for High-Performance Log Analytics

This paper presents, a log analytics platform with near-storage accelerators for high-performance, cost- and power-efficient unstructured log processing. offloads log analytics queries to an efficient near-storage FPGA implementation of a token querying engine, which can take advantage of the high internal bandwidth of storage devices within the available chip resource limitations. This engine is flexible enough to handle complex queries including template search based on user-defined tree-based template libraries, as well as concurrent execution of multiple queries. also uses a log-optimized version of a simple, high-throughput compression algorithm in order to further improve the effective bandwidth of backing storage. Evaluated with complex search queries on large real-world log datasets, achieves an order of magnitude higher performance over software systems, even against more expensive machines with enough DRAM to stage the entire dataset. Furthermore, delivers constant performance regardless of query complexity, resulting in further improved performance benefits with more complex queries. By replacing costly DRAM with storage and power-hungry CPU threads with FPGAs, dramatically improves the cost-effectiveness and accessibility of log analytics.

[1]  Vassilis J. Tsotras,et al.  Massively parallel XML twig filtering using dynamic programming on FPGAs , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[2]  Zibin Zheng,et al.  Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[3]  Jon Stearley,et al.  What Supercomputers Say: A Study of Five System Logs , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[4]  Hossein Bobarshad,et al.  Catalina: In-Storage Processing Acceleration for Scalable Big Data Analytics , 2019, 2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP).

[5]  Zhicheng Liu,et al.  Identifying Frequent User Tasks from Application Logs , 2017, IUI.

[6]  Tal Wagner,et al.  A sampling-based approach to accelerating queries in log management systems , 2016, SPLASH.

[7]  Ran Ginosar,et al.  PRINS: Processing-in-Storage Acceleration of Machine Learning , 2018, IEEE Transactions on Nanotechnology.

[8]  Randy H. Katz,et al.  X-Trace: A Pervasive Network Tracing Framework , 2007, NSDI.

[9]  Sam H. Noh,et al.  Managing Array of SSDs When the Storage Device Is No Longer the Performance Bottleneck , 2017, HotStorage.

[10]  Minyi Guo,et al.  Cowic: A Column-Wise Independent Compression for Log Stream Analysis , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[11]  Steven Swanson,et al.  Summarizer: Trading Communication with Computing Near Storage , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Hui Zhang,et al.  SmartSSD: FPGA Accelerated Near-Storage Data Analytics on SSD , 2020, IEEE Computer Architecture Letters.

[13]  Annibale Panichella,et al.  A Search-Based Approach for Accurate Identification of Log Message Formats , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[14]  Yu Zhang,et al.  Log Clustering Based Problem Identification for Online Service Systems , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[15]  Zibin Zheng,et al.  Drain: An Online Log Parsing Approach with Fixed Depth Tree , 2017, 2017 IEEE International Conference on Web Services (ICWS).

[16]  Sungjin Lee,et al.  AQUOMAN: An Analytic-Query Offloading Machine , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[17]  Sizhuo Zhang,et al.  GraFBoost: Using Accelerated Flash Storage for External Graph Analytics , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[18]  Chao Wang,et al.  SODA: Software defined FPGA based accelerators for big data , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[19]  Jakub Swacha,et al.  Fast and Efficient Log File Compression , 2007, ADBIS Research Communications.

[20]  Shenglin Zhang,et al.  LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs , 2019, IJCAI.

[21]  Viktor K. Prasanna,et al.  High-Performance and Compact Architecture for Regular Expression Matching on FPGA , 2012, IEEE Transactions on Computers.

[22]  Christian Engelmann,et al.  Big Data Meets HPC Log Analytics: Scalable Approach to Understanding Systems at Extreme Scale , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).

[23]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[24]  Wei Xu,et al.  Advances and challenges in log analysis , 2011, Commun. ACM.

[25]  Tore Risch,et al.  Utilizing a NoSQL Data Store for Scalable Log Analysis , 2015, IDEAS.

[26]  Feifei Li,et al.  Spell: Streaming Parsing of System Event Logs , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[27]  Zhou Li,et al.  Operational Security Log Analytics for Enterprise Breach Detection , 2016, 2016 IEEE Cybersecurity Development (SecDev).

[28]  Christoph Hagleitner,et al.  Giving Text Analytics a Boost , 2014, IEEE Micro.

[29]  Chentao Wu,et al.  MLC: An Efficient Multi-level Log Compression Method for Cloud Backup Systems , 2016, 2016 IEEE Trustcom/BigDataSE/ISPA.

[30]  Lionel M. Ni,et al.  CloST: a hadoop-based storage system for big spatio-temporal data analytics , 2012, CIKM '12.

[31]  Ki-Hoon Lee,et al.  The ubiquitous DBMS , 2010, SGMD.

[32]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[33]  Zibin Zheng,et al.  Tools and Benchmarks for Automated Log Parsing , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[34]  Tore Risch,et al.  Comparison of NoSQL Datastores for Large Scale Data Stream Log Analytics , 2019, 2019 IEEE International Conference on Smart Computing (SMARTCOMP).

[35]  Guofei Jiang,et al.  LogMine: Fast Pattern Recognition for Log Analytics , 2016, CIKM.

[36]  Jon Stearley,et al.  Bad Words: Finding Faults in Spirit's Syslogs , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[37]  ColumnBurst: a near-storage accelerator for memory-efficient database join queries , 2020, APSys.

[38]  Jens Dittrich,et al.  A Seven-Dimensional Analysis of Hashing Methods and its Implications on Query Processing , 2015, Proc. VLDB Endow..

[39]  Petko Bakalov,et al.  Boosting XML filtering through a scalable FPGA-based architecture , 2009, CIDR.

[40]  Gustavo Alonso,et al.  Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures , 2017, SIGMOD Conference.

[41]  Reinhard Kutzelnigg Bipartite Random Graphs and Cuckoo Hashing , 2006 .

[42]  Bharat Sukhwani,et al.  Database analytics acceleration using FPGAs , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[43]  Gustavo Alonso,et al.  Centaur: A Framework for Hybrid CPU-FPGA Databases , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[44]  Sangjin Lee,et al.  Forensic investigation framework for the document store NoSQL DBMS: MongoDB as a case study , 2016, Digit. Investig..

[45]  David J. DeWitt,et al.  Query processing on smart SSDs: opportunities and challenges , 2013, SIGMOD '13.

[46]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[47]  Thomas F. Wenisch,et al.  HARE: Hardware accelerator for regular expressions , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[48]  Shilin He,et al.  Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics , 2020, ArXiv.

[49]  Savan Oswal,et al.  DEFLATE COMPRESSION ALGORITHM , 2016 .

[50]  Shenglin Zhang,et al.  Efficient and Robust Syslog Parsing for Network Devices in Datacenter Networks , 2020, IEEE Access.

[51]  Jürgen Teich,et al.  Acceleration of SQL Restrictions and Aggregations through FPGA-Based Dynamic Partial Reconfiguration , 2013, 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines.

[52]  Olegas Vasilecas,et al.  Advances in Databases and Information Systems (ADBIS) , 2002, SIGMOD Rec..

[53]  Gustavo Alonso,et al.  Complex event detection at wire speed with FPGAs , 2010, Proc. VLDB Endow..

[54]  Shenglin Zhang,et al.  Syslog processing for switch failure diagnosis and prediction in datacenter networks , 2017, 2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS).

[55]  Sungjin Lee,et al.  BlueDBM: An appliance for Big Data analytics , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[56]  Shilin He,et al.  Experience Report: System Log Analysis for Anomaly Detection , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[57]  Heiner Giefers,et al.  Compiling text analytics queries to FPGAs , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[58]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[59]  Sanghyun Park,et al.  Inverted index maintenance strategy for flashSSDs: Revitalization of in-place index update strategy , 2015, Inf. Syst..

[60]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[61]  Nicolae Tapus,et al.  Systems Monitoring and Big Data Analysis Using the Elasticsearch System , 2019, 2019 22nd International Conference on Control Systems and Computer Science (CSCS).

[62]  Chanik Park,et al.  Enabling cost-effective data processing with smart SSD , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[63]  Martin L. Kersten,et al.  MonetDB: Two Decades of Research in Column-oriented Database Architectures , 2012, IEEE Data Eng. Bull..

[64]  Thomas F. Wenisch,et al.  HAWK: Hardware support for unstructured log processing , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[65]  Yoohwan Kim,et al.  Text mining for security threat detection discovering hidden information in unstructured log messages , 2016, 2016 IEEE Conference on Communications and Network Security (CNS).

[66]  Xiao Yu,et al.  CloudSeer: Workflow Monitoring of Cloud Infrastructures via Interleaved Logs , 2016, ASPLOS.

[67]  Karun Subramanian Introducing the Splunk Platform , 2020 .

[68]  Viktor K. Prasanna,et al.  Fast Regular Expression Matching Using FPGAs , 2001, The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01).

[69]  Gustavo Alonso,et al.  Runtime Parameterizable Regular Expression Operators for Databases , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[70]  Alexander Aiken,et al.  Alert Detection in System Logs , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[71]  Balázs Rácz,et al.  High density compression of log files , 2004, Data Compression Conference, 2004. Proceedings. DCC 2004.

[72]  Earl E. Swartzlander,et al.  Data Compression Device Based on Modified LZ4 Algorithm , 2018, IEEE Transactions on Consumer Electronics.

[73]  Yuriy Brun,et al.  Leveraging existing instrumentation to automatically infer invariant-constrained models , 2011, ESEC/FSE '11.

[74]  Kiyoung Choi,et al.  ExtraV: Boosting Graph Processing Near Storage with a Coherent Accelerator , 2017, Proc. VLDB Endow..

[75]  Shane Snyder,et al.  IOMiner: Large-Scale Analytics Framework for Gaining Knowledge from I/O Logs , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).

[76]  Ross N. Williams,et al.  An extremely fast Ziv-Lempel data compression algorithm , 1991, [1991] Proceedings. Data Compression Conference.