Scalable Motif Detection and Aggregation

Motif search in graphs has become a popular field of research in recent years, mainly motivated by applications in bioinformatics. Existing work has focused on simple motifs: small sets of vertices directly connected by edges. However, there are applications that require a more general concept of motif, where vertices are only indirectly connected by paths. The size of the solution space is a major limiting factor when dealing with this kind of motif. We try to address this challenge through motif instance aggregation. It turns out that effective, parallel algorithms can be found to compute instances of generalised motifs in large graphs. To expedite the process, we have developed GUERY, a tool that can be used to define motifs and find motif instances, in graphs represented using the popular JUNG graph library [10]. GUERY consists of two parts - a simple domain specific language that can be used to define motifs, and a solver. The main strengths of GUERY are 1. support for motif instance aggregation, 2. generation of query result streams, as opposed to (very large) static sets of matching instances, 3. support for effective parallelisation in the evaluation of queries. The examples used for validation originate from problems encountered when analysing the dependency graphs of object-oriented programs for instances of architectural antipatterns.

[1]  Markku Sakkinen,et al.  Disciplined Inheritance , 1989, ECOOP.

[2]  Arthur J. Riel,et al.  Object-Oriented Design Heuristics , 1996 .

[3]  Philip S. Yu,et al.  Fast Graph Pattern Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[4]  Terence Parr The Definitive ANTLR Reference: Building Domain-Specific Languages , 2007 .

[5]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6]  J. Davenport Editor , 1960 .

[7]  Glenford J. Myers,et al.  Structured Design , 1974, IBM Syst. J..

[8]  Jens Dietrich,et al.  Barriers to Modularity - An Empirical Study to Assess the Potential for Modularisation of Java Programs , 2010, QoSA.

[9]  H. V. Jagadish,et al.  A compression technique to materialize transitive closure , 1990, TODS.

[10]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[11]  Toufik Taibi Design Pattern Formalization Techniques , 2007 .

[12]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[13]  Kevin Lano,et al.  Design Patterns Formalization Techniques , 2007 .

[14]  Jing Li,et al.  The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies , 2010, 2010 Asia Pacific Software Engineering Conference.

[15]  Dirk Beyer,et al.  Efficient relational calculation for software analysis , 2005, IEEE Transactions on Software Engineering.

[16]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[17]  Lei Zou,et al.  DistanceJoin: Pattern Match Query In a Large Graph Database , 2009, Proc. VLDB Endow..

[18]  S. Shen-Orr,et al.  Networks Network Motifs : Simple Building Blocks of Complex , 2002 .

[19]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[20]  Ghan Bir Singh,et al.  Single versus multiple inheritance in object oriented programming , 1995, OOPS.