Discovering Factions in the Computational Linguistics Community

We present a joint probabilistic model of who cites whom in computational linguistics, and also of the words they use to do the citing. The model reveals latent factions, or groups of individuals whom we expect to collaborate more closely within their faction, cite within the faction using language distinct from citation outside the faction, and be largely understandable through the language used when cited from without. We conduct an exploratory data analysis on the ACL Anthology. We extend the model to reveal changes in some authors' faction memberships over time.

[1]  Concha Bielza,et al.  Predicting citation count of Bioinformatics papers within four years of publication , 2009, Bioinform..

[2]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[3]  Eric P. Xing,et al.  Sparse Additive Generative Models of Text , 2011, ICML.

[4]  FARIDEH OSAREH,et al.  Bibliometrics, Citation Analysis and Co-Citation Analysis: A Review of Literature I , 1996, Libri.

[5]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[6]  Brendan T. O'Connor,et al.  Predicting a Scientific Community’s Response to an Article , 2011, EMNLP.

[7]  Daniel Marcu,et al.  A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[8]  Gerhard Weikum,et al.  Graph-based text classification: learn from your neighbors , 2006, SIGIR.

[9]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[10]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[11]  L. Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[12]  Sean Gerrish,et al.  A Language-based Approach to Measuring Scholarly Impact , 2010, ICML.

[13]  Dragomir R. Radev,et al.  The ACL anthology network corpus , 2009, Language Resources and Evaluation.

[14]  Daniel Jurafsky,et al.  Studying the History of Ideas Using Topic Models , 2008, EMNLP.

[15]  David Yarowsky,et al.  Improving Translation Lexicon Induction from Monolingual Corpora via Dependency Contexts and Part-of-Speech Equivalences , 2009, CoNLL.

[16]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[17]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[18]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[19]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[20]  Lise Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[21]  Howard D. White,et al.  Author cocitation: A literature measure of intellectual structure , 1981, J. Am. Soc. Inf. Sci..

[22]  Srinivasan Parthasarathy,et al.  Symmetrizations for clustering directed graphs , 2011, EDBT/ICDT '11.

[23]  David Yarowsky,et al.  Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation , 2011, ACL.

[24]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[27]  T. Minka Estimating a Dirichlet distribution , 2012 .

[28]  K. Lange,et al.  Normal/Independent Distributions and Their Applications in Robust Regression , 1993 .

[29]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[30]  Steffen Bickel,et al.  Unsupervised prediction of citation influences , 2007, ICML '07.

[31]  Sean M. McNee,et al.  On the recommending of citations for research papers , 2002, CSCW '02.

[32]  Daniel Jurafsky,et al.  A Study of Academic Collaborations in Computational Linguistics using a Latent Mixture of Authors Model , 2011, LaTeCH@ACL.

[33]  Yiming Yang,et al.  Stochastic link and group detection , 2002, AAAI/IAAI.

[34]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[35]  Gideon S. Mann,et al.  Bibliometric impact measures leveraging topic analysis , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[36]  Daniel A. McFarland,et al.  A study of academic collaboration in computational linguistics with latent mixtures of authors , 2011, ACL 2011.

[37]  Mário A. T. Figueiredo Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..