Building Similar Link Network in Large-Scale Web Resources

Similar Link Network (SiLN) is a semantic over layer on Web resources with similar relations instead of hyperlinks, which aims at providing plentiful semantics for intelligent Web activities. However, SiLN is difficult to be built based on cosine computation in large-scale Web resources due to its high building time complexity and weak connectivity. Herein, three strategies are proposed to address those issues. First, dividing and conquering strategy is applied to divide the large-scale Web resources into amounts of rough similar communities, which reduces SiLN’s building time complexity significantly. After that, a multi-level structure network is designed to effectively manage the large-scale Web resources to guarantee SiLN’s connectivity. Finally, two-level feedback with isolated resources strategy is developed to improve the accuracy of the building of SiLN. Experimental results have proved that our proposed method of building SiLN is feasible and efficient, with the merits of low complexity, good connectivity and high precision.

[1]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[2]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[3]  Hai Zhuge,et al.  The knowledge grid , 2004 .

[4]  Fangfang Liu,et al.  Discovery of associated topics for the intelligent browsing , 2008, 2008 First IEEE International Conference on Ubi-Media Computing.

[5]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[6]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[7]  Gurmeet Singh Manku,et al.  Detecting near-duplicates for web crawling , 2007, WWW '07.

[8]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[9]  Hai Zhuge,et al.  Autonomous semantic link networking model for the Knowledge Grid , 2007, Concurr. Comput. Pract. Exp..

[10]  Xin Jin,et al.  K-Means Clustering , 2010, Encyclopedia of Machine Learning.

[11]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[12]  Xiang Li,et al.  An automatic semantic relationships discovery approach , 2004, WWW Alt. '04.

[13]  Hai Zhuge,et al.  Communities and Emerging Semantics in Semantic Link Network: Discovery and Learning , 2009, IEEE Transactions on Knowledge and Data Engineering.

[14]  N. Madar,et al.  Immunization and epidemic dynamics in complex networks , 2004 .

[15]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.