Group topic modeling for academic knowledge discovery

Conference mining and expert finding are useful academic knowledge discovery problems from an academic recommendation point of view. Group level (GL) topic modeling can provide us with richer text semantics and relationships, which results in denser topics. And denser topics are more useful for academic discovery issues in contrast to Element level (EL) or Document level (DL) topic modeling, which produces sparser topics. Previous methods performed academic knowledge discovery by using network connectivity (only links not text of documents), keywords-based matching (no semantics) or by using semantics-based intrinsic structure of the words presented between documents (semantics at DL), while ignoring semantics-based intrinsic structure of the words and relationships between conferences (semantics at GL). In this paper, we consider semantics-based intrinsic structure of words and relationships presented in conferences (richer text semantics and relationships) by modeling from GL. We propose group topic modeling methods based on Latent Dirichlet Allocation (LDA). Detailed empirical evaluation shows that our proposed GL methods significantly outperformed DL methods for conference mining and expert finding problems.

[1]  Jörg Kindermann,et al.  Authorship Attribution with Support Vector Machines , 2003, Applied Intelligence.

[2]  Michael Ley,et al.  The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives , 2002, SPIRE.

[3]  Juan-Zi Li,et al.  Recommendation over a Heterogeneous Social Network , 2008, 2008 The Ninth International Conference on Web-Age Information Management.

[4]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[5]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[6]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[8]  Yoav Shoham,et al.  Content-Based, Collaborative Recommendation. , 1997 .

[9]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[11]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[12]  C. J. van Rijsbergen,et al.  Investigating the relationship between language model perplexity and IR precision-recall measures , 2003, SIGIR.

[13]  C. Lee Giles,et al.  Clustering and identifying temporal trends in document databases , 2000, Proceedings IEEE Advances in Digital Libraries 2000.

[14]  Sushil Krishna Bajracharya,et al.  Mining Eclipse Developer Contributions via Author-Topic Models , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[15]  Bernardo A. Huberman,et al.  Email as spectroscopy: automated discovery of community structure within organizations , 2003 .

[16]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[17]  Juan-Zi Li,et al.  Knowledge discovery through directed probabilistic topic models: a survey , 2010, Frontiers of Computer Science in China.

[18]  Philip K. Chan,et al.  Learning implicit user interest hierarchy for context in personalization , 2008, IUI '03.

[19]  S.,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2022 .

[20]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[21]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[22]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[23]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[24]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[25]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[26]  Juan-Zi Li,et al.  Conference Mining via Generalized Topic Modeling , 2009, ECML/PKDD.

[27]  Randy Goebel,et al.  DBconnect: mining research community on DBLP data , 2007, WebKDD/SNA-KDD '07.

[28]  Congfu Xu,et al.  Understanding Research Field Evolving and Trend with Dynamic Bayesian Networks , 2007, PAKDD.

[29]  George Karypis,et al.  Item-based top-N recommendation algorithms , 2004, TOIS.

[30]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.