Graphs can represent a large number of data types, e.g., online social networks, internet links, procedure dependency graphs, etc. The need for indexing massive graphs is an urgent research problem of great practical importance. The main challenge is the size. Each graph may contain at least tens of millions vertices. The working memory may not be able to store the database graph due to its large size, which increases the processing time significantly.
We propose a novel index based subgraph matching scheme, namely SUMMA, for graph querying in massive graphs. We devise two novel indices which capture both local and global information of the database graph. SUMMA is further optimized by the use of a matching scheme to reduce redundant calculations and disk accesses. Last but not least, a number of synthetic datasets are used to evaluate the efficiency and scalability of our proposed method.
[1]
Camil Demetrescu,et al.
Trading off space for passes in graph streaming problems
,
2006,
SODA 2006.
[2]
M. Mitzenmacher.
A brief history of lognormal and power law distributions
,
2001
.
[3]
Shijie Zhang,et al.
GADDI: distance index based subgraph matching in biological networks
,
2009,
EDBT '09.
[4]
O. Sporns,et al.
Motifs in Brain Networks
,
2004,
PLoS biology.
[5]
Jiong Yang,et al.
Discovering Neglected Conditions in Software by Mining Dependence Graphs
,
2008,
IEEE Transactions on Software Engineering.
[6]
Hanan Samet,et al.
Scalable network distance browsing in spatial databases
,
2008,
SIGMOD Conference.