Discriminatively Enhanced Topic Models

This paper proposes a space-efficient, discriminatively enhanced topic model: a V structured topic model with an embedded log-linear component. The discriminative log-linear component reduces the number of parameters to be learnt while outperforming baseline generative models. At the same time, the explanatory power of the generative component is not compromised. We establish its superiority over a purely generative model by applying it to two different ranking tasks: (a) In the first task, we look at the problem of proposing alternative citations given textual and bibliographic evidence. We solve it as a ranking problem in itself and as a platform for further qualitative analysis of convergence of scientific phenomenon. (b) In the second task we address the problem of ranking potential email recipients based on email content and sender information.

[1]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  J. Booth,et al.  Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm , 1999 .

[3]  Daniel Jurafsky,et al.  Who should I cite: learning literature search models from citation behavior , 2010, CIKM.

[4]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[5]  C. Lee Giles,et al.  Finding topic trends in digital libraries , 2009, JCDL '09.

[6]  Luís M. A. Bettencourt,et al.  Scientific discovery and topological transitions in collaboration networks , 2009, J. Informetrics.

[7]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[8]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Prasun Dewan,et al.  Towards hierarchical email recipient prediction , 2012, 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[10]  G. Dosi,et al.  Technological Paradigms and Trajectories , 2007 .

[11]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.

[12]  John DeNero,et al.  Painless Unsupervised Learning with Features , 2010, NAACL.

[13]  Ken Hyland,et al.  Scientific Claims and Community Values: Articulating an Academic Culture. , 1997 .

[14]  Jie Tang,et al.  A Discriminative Approach to Topic-Based Citation Recommendation , 2009, PAKDD.

[15]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[16]  Rajat Raina,et al.  Classification with Hybrid Generative/Discriminative Models , 2003, NIPS.

[17]  Padhraic Smyth,et al.  Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model , 2006, NIPS.

[18]  Jian Pei,et al.  Detecting topic evolution in scientific literature: how can citations help? , 2009, CIKM.

[19]  Daniel Kifer,et al.  Context-aware citation recommendation , 2010, WWW '10.

[20]  William W. Cohen,et al.  Recommending Recipients in the Enron Email Corpus , 1972 .

[21]  M. Fontoura,et al.  Analyzing the performance of top-k retrieval algorithms , .

[22]  Xiang Ji,et al.  Topic evolution and social interactions: how authors effect research , 2006, CIKM '06.

[23]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[24]  William W. Cohen,et al.  Ranking Users for Intelligent Message Addressing , 2008, ECIR.

[25]  G. Dosi Technological Paradigms and Technological Trajectories: A Suggested Interpretation of the Determinants and Directions of Technical Change , 1982 .

[26]  Hongfei Yan,et al.  Recommending citations with translation model , 2011, CIKM '11.

[27]  Dragomir R. Radev,et al.  The ACL anthology network corpus , 2009, Language Resources and Evaluation.

[28]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[29]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[30]  Christopher Joseph Pal CC Prediction with Graphical Models , 2006, CEAS.

[31]  B. Latour,et al.  Laboratory Life: The Social Construction of Scientific Facts , 1983 .