Integration of scientific and social networks

In this paper, we address the problem of scientific-social network integration to find a matching relationship between members of these networks (i.e. The DBLP publication network and the Twitter social network). This task is a crucial step toward building a multi environment expert finding system that has recently attracted much attention in Information Retrieval community. In this paper, the problem of social and scientific network integration is divided into two sub problems. The first problem concerns finding those profiles in one network, which presumably have a corresponding profile in the other network and the second problem concerns the name disambiguation to find true matching profiles among some candidate profiles for matching. Utilizing several name similarity patterns and contextual properties of these networks, we design a focused crawler to find high probable matching pairs, then the problem of name disambiguation is reduced to predict the label of each candidate pair as either true or false matching. Because the labels of these candidate pairs are not independent, state-of-the-art classification methods such as logistic regression and decision tree, which classify each instance separately, are unsuitable for this task. By defining matching dependency graph, we propose a joint label prediction model to determine the label of all candidate pairs simultaneously. Two main types of dependencies among candidate pairs are considered for designing the joint label prediction model which are quite intuitive and general. Using the discriminative approaches, we utilize various feature sets to train our proposed classifiers. An extensive set of experiments have been conducted on six test collection collected from the DBLP and the Twitter networks to show the effectiveness of the proposed joint label prediction model.

[1]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[2]  P. Kam,et al.  : 4 , 1898, You Can Cross the Massacre on Foot.

[3]  Cheng Li,et al.  Two supervised learning approaches for name disambiguation in author citations , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[4]  Adriano Veloso,et al.  Effective self-training author name disambiguation in scholarly digital libraries , 2010, JCDL '10.

[5]  Luo Si,et al.  Discriminative probabilistic models for expert search in heterogeneous information sources , 2011, Information Retrieval.

[6]  Jiawei Han,et al.  Modeling and exploiting heterogeneous bibliographic networks for expertise ranking , 2012, JCDL '12.

[7]  Krisztian Balog,et al.  A User-Oriented Model for Expert Finding , 2011, ECIR.

[8]  Edward A. Fox,et al.  Machine Learning Approach for Homepage Finding Task , 2002, TREC.

[9]  M. de Rijke,et al.  Broad expertise retrieval in sparse data environments , 2007, SIGIR.

[10]  Michael Ley,et al.  DBLP - Some Lessons Learned , 2009, Proc. VLDB Endow..

[11]  Martin Ester,et al.  ILDA: interdependent LDA model for learning latent aspects and their ratings from online product reviews , 2011, SIGIR.

[12]  Erhard Rahm,et al.  Frameworks for entity matching: A comparison , 2010, Data Knowl. Eng..

[13]  Andrew McCallum,et al.  Unsupervised deduplication using cross-field dependencies , 2008, KDD.

[14]  Costas S. Iliopoulos,et al.  String Processing and Information Retrieval , 2015, Lecture Notes in Computer Science.

[15]  Peter Bailey,et al.  Overview of the TREC 2008 Enterprise Track , 2008, TREC.

[16]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[17]  Stephen E. Robertson,et al.  Effective site finding using link anchor information , 2001, SIGIR '01.

[18]  Hongbo Deng,et al.  Enhanced Models for Expertise Retrieval Using Community-Aware Strategies , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[20]  Seung-won Hwang,et al.  SocialSearch + : enriching social network with web evidences , 2012, World Wide Web.

[21]  Maarten de Rijke,et al.  Contextual factors for finding similar experts , 2010, J. Assoc. Inf. Sci. Technol..

[22]  Hui Han,et al.  Name disambiguation in author citations using a K-way spectral clustering method , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[23]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[24]  M. de Rijke,et al.  A language modeling framework for expert finding , 2009, Inf. Process. Manag..

[25]  Wojciech Rytter,et al.  Extracting Powers and Periods in a String from Its Runs Structure , 2010, SPIRE.

[26]  Lise Getoor,et al.  Collective entity resolution in relational data , 2007, TKDD.

[27]  Pavel Serdyukov,et al.  Search for expertise : going beyond direct evidence , 2009 .

[28]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[29]  Andrew McCallum,et al.  Disambiguating Web appearances of people in a social network , 2005, WWW '05.

[30]  Gabriella Kazai,et al.  Overview of the INEX 2007 Book Search track: BookSearch '07 , 2008, SIGF.

[31]  Luo Si,et al.  Discriminative graphical models for faculty homepage discovery , 2010, Information Retrieval.

[32]  Craig MacDonald,et al.  Voting techniques for expert search , 2008, Knowledge and Information Systems.

[33]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[34]  Djoerd Hiemstra,et al.  Combining document- and paragraph-based entity ranking , 2008, SIGIR '08.

[35]  Andrew McCallum,et al.  Conditional Models of Identity Uncertainty with Application to Noun Coreference , 2004, NIPS.