A Utility Model of Authors in the Scientific Community

Authoring a scientific paper is a complex process involving many decisions. We introduce a probabilistic model of some of the important aspects of that process: that authors have individual preferences, that writing a paper requires trading off among the preferences of authors as well as extrinsic rewards in the form of community response to their papers, that preferences (of individuals and the community) and tradeoffs vary over time. Variants of our model lead to improved predictive accuracy of citations given texts and texts given authors. Further, our model’s posterior suggests an interesting relationship between seniority and author choices.

[1]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[2]  Eugene Charniak,et al.  A Hybrid Generative/Discriminative Approach To Citation Prediction , 2015, NAACL.

[3]  Brendan T. O'Connor,et al.  Predicting a Scientific Community’s Response to an Article , 2011, EMNLP.

[4]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[5]  D. McFadden Conditional logit analysis of qualitative choice behavior , 1972 .

[6]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[7]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[8]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[9]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[10]  Noah A. Smith,et al.  The Utility of Text: The Case of Amicus Briefs and the Supreme Court , 2014, AAAI.

[11]  Jure Leskovec,et al.  The Download Estimation task on KDD Cup 2003 , 2003, SKDD.

[12]  Sean Gerrish,et al.  A Language-based Approach to Measuring Scholarly Impact , 2010, ICML.

[13]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[14]  Cristopher Moore,et al.  Scalable text and link analysis with mixed-topic link models , 2013, KDD.

[15]  Ramesh Nallapati,et al.  Link-PLSA-LDA: A New Unsupervised Model for Topics and Influence of Blogs , 2021, ICWSM.

[16]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[17]  D. Holmes,et al.  The Federalist Revisited: New Directions in Authorship Attribution , 1995 .

[18]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[19]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[20]  Daniel Jurafsky,et al.  Studying the History of Ideas Using Topic Models , 2008, EMNLP.

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Eduard H. Hovy,et al.  Pragmatics and Natural Language Generation , 1990, Artif. Intell..

[23]  Prasenjit Mitra,et al.  Utilizing Context in Generative Bayesian Models for Linked Corpus , 2010, AAAI.

[24]  David D. Jensen,et al.  Exploiting relational structure to understand publication patterns in high-energy physics , 2003, SKDD.

[25]  Katharine A. Anderson Specialists and generalists: Equilibrium skill acquisition decisions in problem-solving populations , 2012 .

[26]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[27]  Christopher Potts,et al.  Emergence of Gricean Maxims from Multi-Agent Decision Theory , 2013, NAACL.

[28]  Daniel A. McFarland,et al.  A study of academic collaboration in computational linguistics with latent mixtures of authors , 2011, ACL 2011.

[29]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[30]  Yan Liu,et al.  Topic-link LDA: joint models of topic and author community , 2009, ICML '09.

[31]  Dragomir R. Radev,et al.  The ACL anthology network corpus , 2009, Language Resources and Evaluation.

[32]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.