Substructure similarity measurement in chinese recipes

Improving the precision of information retrieval has been a challenging issue on Chinese Web. As exemplified by Chinese recipes on the Web, it is not easy/natural for people to use keywords (e.g. recipe names) to search recipes, since the names can be literally so abstract that they do not bear much, if any, information on the underlying ingredients or cooking methods. In this paper, we investigate the underlying features of Chinese recipes, and based on workflow-like cooking procedures, we model recipes as graphs. We further propose a novel similarity measurement based on the frequent patterns, and devise an effective filtering algorithm to prune unrelated data so as to support efficient on-line searching. Benefiting from the characteristics of graphs, frequent common patterns can be mined from a cooking graph database. So in our prototype system called RecipeView, we extend the subgraph mining algorithm FSG to cooking graphs and combine it with our proposed similarity measurement, resulting in an approach that well caters for specific users' needs. Our initial experimental studies show that the filtering algorithm can efficiently prune unrelated cooking graphs without affecting the retrieval performance and the similarity measurement gets a relatively higher precision/recall against its counterparts

[1]  Horst Bunke,et al.  A graph distance metric based on the maximal common subgraph , 1998, Pattern Recognit. Lett..

[2]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[3]  Philip S. Yu,et al.  Graph indexing based on discriminative frequent structure analysis , 2005, TODS.

[4]  Horst Bunke,et al.  A New Algorithm for Error-Tolerant Subgraph Isomorphism Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[6]  Anthony K. H. Tung,et al.  Similarity evaluation on tree-structured data , 2005, SIGMOD '05.

[7]  Liping Wang,et al.  CookRecipe : towards a versatile and fully-fledged recipe analysis and learning system , 2008 .

[8]  Xiaofeng Meng,et al.  RecipeCrawler: Collecting Recipe Data from WWW Incrementally , 2006, WAIM.

[9]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[10]  King-Sun Fu,et al.  A distance measure between attributed relational graphs for pattern recognition , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[11]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[12]  Philip S. Yu,et al.  Substructure similarity search in graph databases , 2005, SIGMOD '05.

[13]  Esko Ukkonen,et al.  Approximate String Matching with q-grams and Maximal Matches , 1992, Theor. Comput. Sci..

[14]  Lawrence B. Holder,et al.  An Emprirical Study of Domain Knowledge and Its Benefits to Substructure Discovery , 1997, IEEE Trans. Knowl. Data Eng..

[15]  Liping Wang,et al.  A Personalized Recipe Database System with User- Centered Adaptation and Tutoring Support , 2007 .

[16]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[17]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[18]  Carlo Sansone,et al.  A Comparison of Three Maximum Common Subgraph Algorithms on a Large Database of Labeled Graphs , 2003, GbRPR.

[19]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[20]  Artem Cherkasov,et al.  Novel approaches for small biomolecule classification and structural similarity search , 2007, SKDD.