Adaptable Parsing of Real-Time Data Streams

Today's business processes are rarely accomplished inside the companies domains. More often they involve entities geographically distributed which interact in a loosely coupled cooperation. While cooperating, these entities generate transactional data streams, such as sequences of stock-market buy/sell orders, credit-card purchase records, Web server entries, and electronic fund transfer orders. Such streams are often collections of events stored and processed locally, and they thus have typically ad-hoc, heterogeneous formats. On the other hand, elements in such data streams usually share a common semantics and indeed they can be profitably mined in order to obtain combined global events. In this paper, we present an approach to the parsing of heterogeneous data streams based on the definition of format-dependent grammars and automatic production of ad-hoc parsers. The stream-dependent parsers can be obtained dynamically in a totally automatic way, provided that the appropriate grammar, written in a common format, is fed into the system. We also present a fully working implementation, that has been successfully integrated into a telecommunication environment for real-time processing of billing information flows

[1]  Balachander Krishnamurthy,et al.  On network-aware clustering of Web clients , 2000, SIGCOMM.

[2]  Satish Chandra,et al.  Packet types: abstract specification of network protocol messages , 2000 .

[3]  Godmar Back,et al.  DataScript - A Specification and Scripting Language for Binary Data , 2002, GPCE.

[4]  Anne Rogers,et al.  Hancock: A language for analyzing transactional data streams , 2004, TOPL.

[5]  Alicia Ageno,et al.  Adaptive information extraction , 2006, CSUR.

[6]  Satish Chandra,et al.  Packet Types: Abstract specifications of network protocol messages , 2000, SIGCOMM.

[7]  Jerry R. Hobbs The Generic Information Extraction System , 1993, MUC.

[8]  Viswanathan Kodaganallur,et al.  Incorporating language processing into Java applications: a JavaCC tutorial , 2004, IEEE Software.

[9]  Jean-Cédric Chappelier,et al.  An FPGA-based coprocessor for the parsing of context-free grammars , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[10]  Horst Bunke,et al.  Syntactic and structural pattern recognition : theory and applications , 1990 .

[11]  Robert Gruber,et al.  PADS: a domain-specific language for processing ad hoc data , 2005, PLDI '05.

[12]  Philippe Fouquart,et al.  ASN.1 Communication Between Heterogeneous Systems , 2000 .

[13]  John W. Lockwood,et al.  Reconfigurable context-free grammar based data processing hardware with error recovery , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.