Shine: search heterogeneous interrelated entities

Heterogeneous entities or objects are very common and are usually interrelated with each other in many scenarios. For example, typical Web search activities involve multiple types of interrelated entities such as end users, Web pages, and search queries. In this paper, we define and study a novel problem: <UL>S</UL>earch <UL>H</UL>eterogeneous <UL>IN</UL>terrelated <UL>E</UL>ntities (SHINE). Given a SHINE-query which can be any type(s) of entities, the task of SHINE is to retrieve multiple types of related entities to answer this query. This is in contrast to the traditional search,which only deals with a single type of entities (e.g., Web pages). The advantages of SHINE include: (1) It is feasible for end users to specify their information need along different dimensions by accepting queries with different types. (2) Answering a query by multiple types of entities provides informative context for users to better understand the search results and facilitate their information exploration. (3) Multiple relations among heterogeneous entities can be utilized to improve the ranking of any particular type of entities. To attain the goal of SHINE, we propose to represent all entities in a unified space through utilizing their interaction relationships. Two approaches, M-LSA and E-VSM, are discussed and compared in this paper. The experiments on 3 data sets (i.e., a literature data set, a search engine log data set, and a recommendation data set) show the effectiveness and flexibility of our proposed methods.

[1]  Debapriyo Majumdar,et al.  Why spectral retrieval works , 2005, SIGIR '05.

[2]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[3]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[4]  Brian D. Davison Toward a unification of text and link analysis , 2003, SIGIR.

[5]  Philip S. Yu,et al.  Unsupervised learning on k-partite graphs , 2006, KDD '06.

[6]  Ran El-Yaniv,et al.  Multi-way distributional clustering via pairwise interactions , 2005, ICML.

[7]  Wei-Ying Ma,et al.  Object-level ranking: bringing order to Web objects , 2005, WWW '05.

[8]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[9]  Hongjun Lu,et al.  ReCoM: reinforcement clustering of multi-type interrelated data objects , 2003, SIGIR.

[10]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[11]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[12]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[13]  Soumen Chakrabarti,et al.  Optimizing scoring functions and indexes for proximity search in type-annotated corpora , 2006, WWW '06.

[14]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[15]  Ji-Rong Wen,et al.  Clustering user queries of a search engine , 2001, WWW '01.

[16]  Zheng Chen,et al.  Latent semantic analysis for multiple-type interrelated data objects , 2006, SIGIR.

[17]  ChengXiang Zhai,et al.  Learn from web search logs to organize search results , 2007, SIGIR.

[18]  Kevin Chen-Chuan Chang,et al.  EntityRank: Searching Entities Directly and Holistically , 2007, VLDB.

[19]  Edward A. Fox,et al.  Link fusion: a unified link analysis framework for multi-type interrelated data objects , 2004, WWW '04.

[20]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[21]  Soumen Chakrabarti,et al.  Learning to rank networked entities , 2006, KDD '06.

[22]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.