A hierarchical co-clustering approach for entity exploration over Linked Data

Abstract With the increasing amount of Linked Data on the Web, large numbers of linked entities often make it difficult for users to find the entities of interest quickly for further exploration. Clustering as a fundamental approach, has been adopted to organize entities into meaningful groups. In general, link and entity class are semantically labelled and can be used to group linked entities. However, entities are usually associated with many links and classes. To avoid information overload, we propose a novel hierarchical co-clustering approach to simultaneously group links and entity classes. In our approach, we define a measure of intra-link similarity and intra-class similarity respectively, and then incorporate them into co-clustering. Our proposed approach is implemented in a Linked Data browser called CoClus. We compare it with other three browsers by conducting a task-based user study and the experimental results show that our approach provides useful support for entity exploration. We also compare our algorithm with three baseline co-clustering algorithms and the experimental results indicate that it outperforms baselines in terms of the Clustering Index score.

[1]  Ji-Rong Wen,et al.  Query clustering using user logs , 2002, TOIS.

[2]  Afra Pascual,et al.  Building a Usable and Accessible Semantic Web Interaction Platform , 2010, World Wide Web.

[3]  Mehdi Hosseini,et al.  Hierarchical Co-clustering for Web Queries and Selected URLs , 2007, WISE.

[4]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[5]  Eyal Oren,et al.  Extending Faceted Navigation for RDF Data , 2006, SEMWEB.

[6]  Andrea Giovanni Nuzzolese,et al.  Aemoo: exploring knowledge on the web , 2013, WebSci.

[7]  Yuzhong Qu,et al.  Iterative Entity Navigation via Co-clustering Semantic Links and Entity Classes , 2016, ESWC.

[8]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[9]  Wei-Ying Ma,et al.  Building implicit links from content for forum search , 2006, SIGIR.

[10]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[11]  Seok-Ho Yoon,et al.  On computing text-based similarity in scientific literature , 2011, WWW.

[12]  George Papastefanatos,et al.  rdf: SynopsViz - A Framework for Hierarchical Linked Data Visual Exploration and Analysis , 2014, ESWC.

[13]  Yiannis Kompatsiaris,et al.  Co-Clustering Tags and Social Data Sources , 2008, 2008 The Ninth International Conference on Web-Age Information Management.

[14]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[15]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[16]  Filippo Menczer,et al.  Combining link and content analysis to estimate semantic similarity , 2004, WWW Alt. '04.

[17]  Kevin Li,et al.  Faceted metadata for image search and browsing , 2003, CHI '03.

[18]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[19]  Mohamed Nadif,et al.  Graph modularity maximization as an effective method for co-clustering text data , 2016, Knowl. Based Syst..

[20]  Taeho Jo,et al.  The Evaluation Measure of Text Clustering for the Variable Number of Clusters , 2007, ISNN.

[21]  Eduardo R. Hruschka,et al.  Simultaneous co-clustering and learning to address the cold start problem in recommender systems , 2015, Knowl. Based Syst..

[22]  Lydia B. Chilton,et al.  Tabulator: Exploring and Analyzing linked data on the Semantic Web , 2006 .

[23]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[24]  Edward A. Fox,et al.  SimFusion: measuring similarity using unified relationship matrix , 2005, SIGIR '05.

[25]  Panos M. Pardalos,et al.  Biclustering in data mining , 2008, Comput. Oper. Res..

[26]  Witold Pedrycz,et al.  Interval-valued fuzzy set approach to fuzzy co-clustering for data classification , 2016, Knowl. Based Syst..

[27]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[28]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[29]  Changsheng Xu,et al.  Joint Local and Global Consistency on Interdocument and Interword Relationships for Co-Clustering , 2015, IEEE Transactions on Cybernetics.

[30]  Joachim M. Buhmann,et al.  Unsupervised Texture Segmentation in a Deterministic Annealing Framework , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Marti A. Hearst Clustering versus faceted categories for information exploration , 2006, Commun. ACM.

[32]  Gérard Govaert,et al.  Mutual information, phi-squared and model-based co-clustering for contingency tables , 2016, Advances in Data Analysis and Classification.

[33]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[34]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[35]  David R. Karger,et al.  Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections , 2017, SIGF.

[36]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[37]  Chang-Dong Wang,et al.  A Novel Co-clustering Method with Intra-similarities , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[38]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[39]  Mika Käki,et al.  Findex: search result categories help users when document ranking fails , 2005, CHI.

[40]  Lynda Hardman,et al.  /facet: A Browser for Heterogeneous Semantic Web Repositories , 2006, SEMWEB.