SUMMA: subgraph matching in massive graphs

Graphs can represent a large number of data types, e.g., online social networks, internet links, procedure dependency graphs, etc. The need for indexing massive graphs is an urgent research problem of great practical importance. The main challenge is the size. Each graph may contain at least tens of millions vertices. The working memory may not be able to store the database graph due to its large size, which increases the processing time significantly. We propose a novel index based subgraph matching scheme, namely SUMMA, for graph querying in massive graphs. We devise two novel indices which capture both local and global information of the database graph. SUMMA is further optimized by the use of a matching scheme to reduce redundant calculations and disk accesses. Last but not least, a number of synthetic datasets are used to evaluate the efficiency and scalability of our proposed method.