Analysis of Implicit Relations on Wikipedia: Measuring Strength through Mining Elucidatory Objects

We focus on measuring relations between pairs of objects in Wikipedia whose pages can be regarded as individual objects. Two kinds of relations between two objects exist: in Wikipedia, an explicit relation is represented by a single link between the two pages for the objects, and an implicit relation is represented by a link structure containing the two pages. Previously proposed methods are inadequate for measuring implicit relations because they use only one or two of the following three important factors: distance, connectivity, and co-citation. We propose a new method reflecting all the three factors by using a generalized maximum flow. We confirm that our method can measure the strength of a relation more appropriately than these previously proposed methods do. Another remarkable aspect of our method is mining elucidatory objects, that is, objects constituting a relation. We explain that mining elucidatory objects opens a novel way to deeply understand a relation.

[1]  Edward A. Fox,et al.  SimFusion: measuring similarity using unified relationship matrix , 2005, SIGIR '05.

[2]  Evangelos E. Milios,et al.  Node similarity in the citation graph , 2006, Knowledge and Information Systems.

[3]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[4]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[5]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.

[6]  Yehuda Koren,et al.  Measuring and extracting proximity in networks , 2006, KDD '06.

[7]  Boualem Benatallah Web Information Systems Engineering - WISE 2007, 8th International Conference on Web Information Systems Engineering, Nancy, France, December 3-7, 2007, Proceedings , 2007, WISE.

[8]  Gerhard Weikum,et al.  Efficiently Handling Dynamics in Distributed Link Based Authority Analysis , 2008, WISE.

[9]  Takahiro Hara,et al.  Association thesaurus construction methods based on link co-occurrence analysis for wikipedia , 2008, CIKM '08.

[10]  Gerhard Weikum,et al.  NAGA: Searching and Ranking Knowledge , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[11]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[12]  Éva Tardos,et al.  Generalized maximum flow algorithms , 1999 .

[13]  Takahiro Hara,et al.  Wikipedia Mining for an Association Web Thesaurus Construction , 2007, WISE.

[14]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[15]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[16]  Charles H. Hubbell An Input-Output Approach to Clique Identification , 1965 .

[17]  Eduardo Mena,et al.  Web-Based Measure of Semantic Relatedness , 2008, WISE.

[18]  Howard D. White,et al.  Author cocitation: A literature measure of intellectual structure , 1981, J. Am. Soc. Inf. Sci..

[19]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[20]  Peter G. Doyle,et al.  Random Walks and Electric Networks: REFERENCES , 1987 .

[21]  Bo Zhang,et al.  StatSnowball: a statistical approach to extracting entity relationships , 2009, WWW '09.

[22]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .