Learning to rank graphs for online similar graph search

Many applications in structure matching require the ability to search for graphs that are similar to a query graph, i.e., similarity graph queries. Prior works, especially in chemoinformatics, have used the maximum common edge subgraph (MCEG) to compute the graph similarity. This approach is prohibitively slow for real-time queries. In this work, we propose an algorithm that extracts and indexes subgraph features from a graph dataset. It computes the similarity of graphs using a linear graph kernel based on feature weights learned offline from a training set generated using MCEG. We show empirically that our proposed algorithm of learning to rank graphs can achieve higher normalized discounted cumulative gain compared with existing optimal methods based on MCEG. The running time of our algorithm is orders of magnitude faster than these existing methods.

[1]  Sourav S. Bhowmick,et al.  XML structural delta mining: Issues and challenges , 2006, Data Knowl. Eng..

[2]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[3]  C. Burges,et al.  Learning to Rank Using Classification and Gradient Boosting , 2008 .

[4]  C. Lee Giles,et al.  Extraction and search of chemical formulae in text documents on the web , 2007, WWW '07.

[5]  Prasenjit Mitra,et al.  Predicting Blogging Behavior Using Temporal and Social Networks , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[6]  Philip S. Yu,et al.  Feature-based Substructure Similarity Search , 2009 .

[7]  Philip S. Yu,et al.  Feature-based similarity search in graph structures , 2006, TODS.

[8]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[9]  Hongyuan Zha,et al.  A regression framework for learning ranking functions using relative relevance judgments , 2007, SIGIR.

[10]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[11]  C. Lee Giles,et al.  Mining, indexing, and searching for textual chemical molecule information on the web , 2008, WWW.

[12]  John Yen,et al.  Topic segmentation with shared topic detection and alignment of multiple documents , 2007, SIGIR.

[13]  John Yen,et al.  Multi-task text segmentation and alignment based on weighted mutual information , 2006, CIKM '06.

[14]  Peter Willett,et al.  RASCAL: Calculation of Graph Similarity using Maximum Common Edge Subgraphs , 2002, Comput. J..