Expert Finding in Heterogeneous Bibliographic Networks with Locally-trained Embeddings

Expert finding is an important task in both industry and academia. It is challenging to rank candidates with appropriate expertise for various queries. In addition, different types of objects interact with one another, which naturally forms heterogeneous information networks. We study the task of expert finding in heterogeneous bibliographical networks based on two aspects: textual content analysis and authority ranking. Regarding the textual content analysis, we propose a new method for query expansion via locally-trained embedding learning with concept hierarchy as guidance, which is particularly tailored for specific queries with narrow semantic meanings. Compared with global embedding learning, locally-trained embedding learning projects the terms into a latent semantic space constrained on relevant topics, therefore it preserves more precise and subtle information for specific queries. Considering the candidate ranking, the heterogeneous information network structure, while being largely ignored in the previous studies of expert finding, provides additional information. Specifically, different types of interactions among objects play different roles. We propose a ranking algorithm to estimate the authority of objects in the network, treating each strongly-typed edge type individually. To demonstrate the effectiveness of the proposed framework, we apply the proposed method to a large-scale bibliographical dataset with over two million entries and one million researcher candidates. The experiment results show that the proposed framework outperforms existing methods for both general and specific queries.

[1]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[2]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[3]  Yizhou Sun,et al.  Mining Heterogeneous Information Networks: Principles and Methodologies , 2012, Mining Heterogeneous Information Networks: Principles and Methodologies.

[4]  M. de Rijke,et al.  Expertise Retrieval , 2012, Found. Trends Inf. Retr..

[5]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[6]  W. Bruce Croft,et al.  Quary Expansion Using Local and Global Document Analysis , 1996, SIGIR Forum.

[7]  Christoph Meinel,et al.  Telling experts from spammers: expertise ranking in folksonomies , 2009, SIGIR.

[8]  Kai-Hsiang Yang,et al.  Using google distance for query expansion in expert finding , 2014, Ninth International Conference on Digital Information Management (ICDIM 2014).

[9]  Morten Hertzum,et al.  The information-seeking practices of engineers: searching for documents as well as for people , 2000, Inf. Process. Manag..

[10]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[11]  Weiguo Fan,et al.  ExpertRank: A topic-aware expert finding algorithm for online knowledge communities , 2013, Decis. Support Syst..

[12]  Jiawei Han,et al.  Modeling and exploiting heterogeneous bibliographic networks for expertise ranking , 2012, JCDL '12.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Yizhou Sun,et al.  RankClus: integrating clustering with ranking for heterogeneous information network analysis , 2009, EDBT '09.

[15]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Nicole Bauer,et al.  Information Retrieval Implementing And Evaluating Search Engines , 2016 .

[17]  Hongyuan Zha,et al.  Co-ranking Authors and Documents in a Heterogeneous Network , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[18]  Nick Craswell,et al.  Query Expansion with Locally-Trained Word Embeddings , 2016, ACL.

[19]  Jiawei Han,et al.  Representing Documents via Latent Keyphrase Inference , 2016, WWW.

[20]  Hongbo Deng,et al.  Formal Models for Expert Finding on DBLP Bibliography Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[21]  Nick Craswell,et al.  Overview of the TREC 2006 Enterprise Track , 2006, TREC.

[22]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[23]  Ryen W. White,et al.  Enhancing Expert Finding Using Organizational Hierarchies , 2009, ECIR.

[24]  Jiawei Han,et al.  Ranking-based classification of heterogeneous information networks , 2011, KDD.

[25]  Eugene Agichtein,et al.  Learning to recognize reliable users and content in social media with coupled mutual reinforcement , 2009, WWW '09.

[26]  Juan-Zi Li,et al.  Expert Finding in a Social Network , 2007, DASFAA.

[27]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[28]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[29]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[30]  W. Bruce Croft,et al.  Finding experts in community-based question-answering services , 2005, CIKM '05.

[31]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[32]  Marcel Worring,et al.  Unsupervised, Efficient and Semantic Expertise Retrieval , 2016, WWW.

[33]  ChengXiang Zhai,et al.  Probabilistic Models for Expert Finding , 2007, ECIR.