Matching Heterogeneous Events with Patterns

A large amount of heterogeneous event data are increasingly generated, e.g., in online systems for Web services or operational systems in enterprises. Owing to the difference between event data and traditional relational data, the matching of heterogeneous events is highly non-trivial. While event names are often opaque (e.g., merely with obscure IDs), the existing structure-based matching techniques for relational data also fail to perform owing to the poor discriminative power of dependency relationships between events. We note that interesting patterns exist in the occurrence of events, which may serve as discriminative features in event matching. In this paper, we formalize the problem of matching events with patterns. A generic pattern based matching framework is proposed, which is compatible with the existing structure based techniques. To improve the matching efficiency, we devise several bounds of matching scores for pruning. Recognizing the np-hardness of the optimal event matching problem with patterns, we propose efficient heuristic. Finally, extensive experiments demonstrate the effectiveness of our pattern based matching compared with approaches adapted from existing techniques, and the efficiency improved by the bounding, pruning and heuristic methods.

[1]  Tao Jin,et al.  Efficiently Querying Business Process Models with BeehiveZ , 2011, BPM.

[2]  Yanlei Diao,et al.  High-performance complex event processing over streams , 2006, SIGMOD Conference.

[3]  Mehrdad Sabetzadeh,et al.  Matching and Merging of Statecharts Specifications , 2007, 29th International Conference on Software Engineering (ICSE'07).

[4]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[5]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[6]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[7]  Carmem S. Hara,et al.  Querying and Managing Provenance through User Views in Scientific Workflows , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[8]  Susan B. Davidson,et al.  Detecting and resolving unsound workflow views for correct provenance analysis , 2009, SIGMOD Conference.

[9]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[10]  Jeffrey F. Naughton,et al.  On schema matching with opaque column names and data values , 2003, SIGMOD '03.

[11]  Jianmin Wang,et al.  Efficient Recovery of Missing Events , 2013, IEEE Transactions on Knowledge and Data Engineering.

[12]  Luis Gravano,et al.  Text joins in an RDBMS for web data integration , 2003, WWW '03.

[13]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[14]  Jun'ichi Tatemura,et al.  Runtime Semantic Query Optimization for Event Stream Processing , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[15]  X.S. Wang,et al.  Discovering Frequent Event Patterns with Multiple Granularities in Time Sequences , 1998, IEEE Trans. Knowl. Data Eng..

[16]  Jianmin Wang,et al.  Cleaning structured event logs: A graph repair approach , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[17]  Daniel Gillblad,et al.  Discovering Process Models from Unlabelled Event Logs , 2009, BPM.

[18]  Jayant Madhavan,et al.  Reference reconciliation in complex information spaces , 2005, SIGMOD '05.

[19]  Philip S. Yu,et al.  Matching heterogeneous events with patterns , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[20]  Alexander L. Wolf,et al.  Event-Based Detection of Concurrency ; CU-CS-860-98 , 1998 .

[21]  David Luckham,et al.  The power of events - an introduction to complex event processing in distributed enterprise systems , 2002, RuleML.

[22]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.

[23]  Tao Jin,et al.  Querying business process model repositories , 2014, World Wide Web.

[24]  Jan Mendling,et al.  Seven process modeling guidelines (7PMG) , 2010, Inf. Softw. Technol..

[25]  Dimitrios Gunopulos,et al.  Mining Process Models from Workflow Logs , 1998, EDBT.

[26]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[27]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[28]  Alon Y. Halevy,et al.  Bootstrapping pay-as-you-go data integration systems , 2008, SIGMOD Conference.

[29]  Alexander L. Wolf,et al.  Event-based detection of concurrency , 1998, SIGSOFT '98/FSE-6.

[30]  Wil M.P. van der Aalst,et al.  Process mining: discovering workflow models from event-based data , 2001 .