Mining and Explaining Relationships in Wikipedia

Mining and explaining relationships between objects are challenging tasks in the field of knowledge search. We propose a new approach for the tasks using disjoint paths formed by links in Wikipedia. To realizing this approach, we propose a naive and a generalized flow based method, and a technique of avoiding flow confluences for forcing a generalized flow to be disjoint as possible. We also apply the approach to classification of relationships. Our experiments reveal that the generalized flow based method can mine many disjoint paths important for a relationship, and the classification is effective for explaining relationships.

[1]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[2]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[3]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[4]  Benjamin C. M. Fung,et al.  Hierarchical Document Clustering using Frequent Itemsets , 2003, SDM.

[5]  Wilfred Ng,et al.  Context-Aware Object Connection Discovery in Large Graphs , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[6]  Gerhard Weikum,et al.  NAGA: Searching and Ranking Knowledge , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[8]  Yehuda Koren,et al.  Measuring and extracting proximity in networks , 2006, KDD '06.

[9]  Hua Li,et al.  Improving web search results using affinity graph , 2005, SIGIR '05.

[10]  Amit P. Sheth,et al.  SemRank: ranking complex relationship search results on the semantic web , 2005, WWW '05.

[11]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[12]  Éva Tardos,et al.  Generalized maximum flow algorithms , 1999 .

[13]  Bo Zhang,et al.  StatSnowball: a statistical approach to extracting entity relationships , 2009, WWW '09.

[14]  Peter G. Doyle,et al.  Random Walks and Electric Networks: REFERENCES , 1987 .

[15]  Jens Lehmann,et al.  Discovering Unknown Connections - the DBpedia Relationship Finder , 2007, CSSW.

[16]  Xinpeng Zhang,et al.  Mining and Explaining Relationships in Wikipedia , 2010, DEXA.

[17]  Amit P. Sheth,et al.  Context-Aware Semantic Association Ranking , 2003, SWDB.

[18]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.

[19]  Takahiro Hara,et al.  Wikipedia Mining for an Association Web Thesaurus Construction , 2007, WISE.

[20]  Marta M. B. Pascoal,et al.  A new implementation of Yen’s ranking loopless paths algorithm , 2003, 4OR.

[21]  Xinpeng Zhang,et al.  A Generalized Flow-Based Method for Analysis of Implicit Relationships on Wikipedia , 2013, IEEE Transactions on Knowledge and Data Engineering.

[22]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[23]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.