Hyperion: High Volume Stream Archival for Retrospective Querying

Network monitoring systems that support data archiving and after-the-fact (retrospective) queries are useful for a multitude of purposes, such as anomaly detection and network and security forensics. Data archiving for such systems, however, is complicated by (a) data arrival rates, which may be hundreds of thousands of packets per second on a single link, and (b) the need for online indexing of this data to support retrospective queries. At these data rates, both common database index structures and general-purpose file systems perform poorly. This paper describes Hyperion, a system for archiving, indexing, and on-line retrieval of high-volume data streams. We employ a write-optimized stream file system for high-speed storage of simultaneous data streams, and a novel use of signature file indexes in a distributed multi-level index. We implement Hyperion on commodity hardware and conduct a detailed evaluation using synthetic data and real network traces. Our streaming file system, StreamFS, is shown to be fast enough to archive traces at over a million packets per second. The index allows queries over hours of data to complete in as little as 10-20 seconds, and the entire system is able to index and archive over 200,000 packets/sec while processing simultaneous on-line queries.

[1]  Christos Faloutsos,et al.  Signature files: an access method for documents and its analytical performance evaluation , 1984, TOIS.

[2]  Yannis E. Ioannidis,et al.  Bitmap index design and evaluation , 1998, SIGMOD '98.

[3]  Christos Faloutsos,et al.  Fast Text Access Methods for Optical and Large Magnetic Disks: Designs and Performance Comparison , 1988, VLDB.

[4]  Christophe Diot,et al.  The CoMo white paper , 2004 .

[5]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[6]  Kurt Stockinger,et al.  Design and implementation of bitmap indices for scientific data , 2001, Proceedings 2001 International Database Engineering and Applications Symposium.

[7]  Shudong Jin,et al.  Techniques for efficiently allocating persistent storage , 2003, J. Syst. Softw..

[8]  Kotagiri Ramamohanarao,et al.  A two level superimposed coding scheme for partial match retrieval , 1983, Inf. Syst..

[9]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[10]  Anja Feldmann,et al.  Operational experiences with high-volume network intrusion detection , 2004, CCS '04.

[11]  Peter Desnoyers,et al.  TSAR: a two tier sensor storage architecture using interval skip graphs , 2005, SenSys '05.

[12]  Fouad A. Tobagi,et al.  Streaming RAID: a disk array management system for video files , 1993, MULTIMEDIA '93.

[13]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[14]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[15]  David R. Cheriton,et al.  The V distributed system , 1988, CACM.

[16]  Ramesh Govindan,et al.  Advanced Indexing Techniques for Wide-Area Network Monitoring , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[17]  Yanping Zhao,et al.  HyLog: A High Performance Approach to Managing Disk Layout , 2004, FAST.

[18]  Michael Stonebraker,et al.  The 8 requirements of real-time stream processing , 2005, SGMD.

[19]  Eran Gabber,et al.  Storage Management for Web Proxies , 2001, USENIX Annual Technical Conference, General Track.

[20]  Andrew W. Moore,et al.  Architecture of a network monitor , 2003 .

[21]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[22]  D. H. Mellor,et al.  Real time , 1981 .

[23]  Scott Shenker,et al.  The Architecture of PIER: an Internet-Scale Query Processor , 2005, CIDR.

[24]  Sara McMains,et al.  File System Logging versus Clustering: A Performance Comparison , 1995, USENIX.

[25]  Deborah Estrin,et al.  Dimensions: why do we need a new data handling architecture for sensor networks? , 2003, CCRV.

[26]  Wei Hu,et al.  Scalability in the XFS File System , 1996, USENIX Annual Technical Conference.