Discovery of topic flows of authors

With an increase in the number of Web documents, the number of proposed methods for knowledge discovery on Web documents have been increased as well. The documents do not always provide keywords or categories, so unsupervised approaches are desirable, and topic modeling is such an approach for knowledge discovery without using labels. Further, Web documents usually have time information such as publish years, so knowledge patterns over time can be captured by incorporating the time information. The temporal patterns of knowledge can be used to develop useful services such as a graph of research trends, finding similar authors (potential co-authors) to a particular author, or finding top researchers about a specific research domain. In this paper, we propose a new topic model, Author Topic-Flow (ATF) model, whose objective is to capture temporal patterns of research interests of authors over time, where each topic is associated with a research domain. The state-of-the-art model, namely Temporal Author Topic model, has the same objective as ours, where it computes the temporal patterns of authors by combining the patterns of topics. We believe that such ‘indirect’ temporal patterns will be poor than the ‘direct’ temporal patterns of our proposed model. The ATF model allows each author to have a separated variable which models the temporal patterns, so we denote it as ‘direct’ topic flow. The design of the ATF model is based on the hypothesis that ‘direct’ topic flows will be better than the ‘indirect’ topic flows. We prove the hypothesis is true by a structural comparison between the two models and show the effectiveness of the ATF model by empirical results.

[1]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[2]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[3]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4]  Jörg Kindermann,et al.  Authorship Attribution with Support Vector Machines , 2003, Applied Intelligence.

[5]  Thomas L. Griffiths,et al.  Learning author-topic models from text corpora , 2010, TOIS.

[6]  Thomas L. Griffiths,et al.  Probabilistic author-topic models for information discovery , 2004, KDD.

[7]  Stephen G. Kobourov,et al.  Exploring the computing literature using temporal graph visualization , 2004, IS&T/SPIE Electronic Imaging.

[8]  Peter Mutschke,et al.  Mining Networks and Central Entities in Digital Libraries. A Graph Theoretic Approach Applied to Co-author Networks , 2003, IDA.

[9]  Alice H. Oh,et al.  Aspect and sentiment unification model for online review analysis , 2011, WSDM '11.

[10]  Pedro M. Domingos,et al.  Unsupervised Ontology Induction from Text , 2010, ACL.

[11]  Yasser Yasami,et al.  A novel unsupervised classification approach for network anomaly detection by k-Means clustering and ID3 decision tree learning methods , 2010, The Journal of Supercomputing.

[12]  Padhraic Smyth,et al.  Statistical entity-topic models , 2006, KDD '06.

[13]  Huidong Jin,et al.  Sequential Latent Dirichlet Allocation: Discover Underlying Topic Structures within a Document , 2010, 2010 IEEE International Conference on Data Mining.

[14]  Shaowen Yao,et al.  An overview of topic modeling and its current applications in bioinformatics , 2016, SpringerPlus.

[15]  David M. Blei,et al.  Connections between the lines: augmenting social networks with text , 2009, KDD.

[16]  Ho-Jin Choi,et al.  Sequential Entity Group Topic Model for Getting Topic Flows of Entity Groups within One Document , 2012, PAKDD.

[17]  Huidong Jin,et al.  A segmented topic model based on the two-parameter Poisson-Dirichlet process , 2010, Machine Learning.

[18]  Marie-Francine Moens,et al.  Probabilistic topic modeling in multilingual settings: An overview of its methodology and applications , 2015, Inf. Process. Manag..

[19]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[20]  Stephen G. MacDonell,et al.  Software Forensics: Extending Authorship Analysis Techniques to Computer Programs , 2002 .

[21]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[22]  L. Chou,et al.  An empirical analysis of land property lawsuits and rainfalls , 2016, SpringerPlus.

[23]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[24]  Juan-Zi Li,et al.  Temporal expert finding through generalized time topic modeling , 2010, Knowl. Based Syst..

[25]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[26]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Roi Blanco,et al.  Probabilistic static pruning of inverted files , 2010, TOIS.

[28]  Wookey Lee,et al.  Mobile Web Navigation in Digital Ecosystems Using Rooted Directed Trees , 2011, IEEE Transactions on Industrial Electronics.

[29]  Wenjun Yuan,et al.  Dynamics of the functions $$ f_\mu (z)=z\exp (z+\mu ) $$fμ(z)=zexp(z+μ) with the real parameter , 2016, SpringerPlus.

[30]  Padhraic Smyth,et al.  Algorithms for estimating relative importance in networks , 2003, KDD '03.

[31]  Edmund Y. Lam,et al.  Unsupervised Tracking With the Doubly Stochastic Dirichlet Process Mixture Model , 2016, IEEE Transactions on Intelligent Transportation Systems.

[32]  Keqiu Li,et al.  Optimized big data K-means clustering using MapReduce , 2014, The Journal of Supercomputing.

[33]  Ali Daud,et al.  Using time topic modeling for semantics-based dynamic research interest finding , 2012, Knowl. Based Syst..

[34]  Andrew McCallum,et al.  Expertise modeling for matching papers with reviewers , 2007, KDD '07.