Using time topic modeling for semantics-based dynamic research interest finding

Researchers interests finding has been an active area of investigation for different recommendation tasks. Previous approaches for finding researchers interests exploit writing styles and links connectivity by considering time of documents, while semantics-based intrinsic structure of words is ignored. Consequently, a topic model named Author-Topic model is proposed, which exploits semantics-based intrinsic structure of words present between the authors of research papers. It ignores simultaneous modeling of time factor which results in exchangeability of topics problem, which is, important factor to deal with when finding dynamic research interests. For example, in many real world applications, like finding reviewers for papers and finding taggers in the social tagging systems one needs to consider different time periods. In this paper, we present time topic modeling approach named Temporal-Author-Topic (TAT) which can simultaneously model text, researchers and time of research papers to overcome the exchangeability of topic problem. The mixture distribution over topics is influenced by both co-occurrences of words and timestamps of the research papers. Consequently, topics occurrence and their related researchers change over time, while the meaning of particular topic almost remains unchanged. Proposed approach is used to discover topically related researchers for different time periods. We also show how their interests and relationships change over a time period. Empirical results on large research papers corpus show the effectiveness of our proposed approach and dominance over Author-Topic (AT) model, by handling the exchangeability of topics problem, which enables it to obtain similar meaning of a particular topic overtime.

[1]  M. de Rijke,et al.  Broad expertise retrieval in sparse data environments , 2007, SIGIR.

[2]  Andrew McCallum,et al.  Expertise modeling for matching papers with reviewers , 2007, KDD '07.

[3]  Padhraic Smyth,et al.  Algorithms for estimating relative importance in networks , 2003, KDD '03.

[4]  Juan-Zi Li,et al.  Expert Finding in a Social Network , 2007, DASFAA.

[5]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[6]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[7]  Juan-Zi Li,et al.  Temporal expert finding through generalized time topic modeling , 2010, Knowl. Based Syst..

[8]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[9]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Hai Dong,et al.  Semantic service matchmaking for Digital Health Ecosystems , 2011, Knowl. Based Syst..

[12]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[13]  Peter Mutschke,et al.  Mining Networks and Central Entities in Digital Libraries. A Graph Theoretic Approach Applied to Co-author Networks , 2003, IDA.

[14]  Stephen G. MacDonell,et al.  Software Forensics: Extending Authorship Analysis Techniques to Computer Programs , 2002 .

[15]  Thomas L. Griffiths,et al.  Learning author-topic models from text corpora , 2010, TOIS.

[16]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[17]  Juan-Zi Li,et al.  A Generalized Topic Modeling Approach for Maven Search , 2009, APWeb/WAIM.

[18]  C. J. van Rijsbergen,et al.  Investigating the relationship between language model perplexity and IR precision-recall measures , 2003, SIGIR.

[19]  M E Newman,et al.  Scientific collaboration networks. I. Network construction and fundamental results. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Jörg Kindermann,et al.  Authorship Attribution with Support Vector Machines , 2003, Applied Intelligence.

[21]  Thomas L. Griffiths,et al.  Probabilistic author-topic models for information discovery , 2004, KDD.

[22]  Juan-Zi Li,et al.  Knowledge discovery through directed probabilistic topic models: a survey , 2010, Frontiers of Computer Science in China.

[23]  Stephen G. Kobourov,et al.  Exploring the computing literature using temporal graph visualization , 2004, IS&T/SPIE Electronic Imaging.