Science policy makers, university administrators, funding agencies, and prospective students all rely on many factors when deciding which academic institutions to become involved with. Organizations have stepped in to provide information to support such decision makers. From US News and World Report rankings1 to the recently released National Research Council report on academic institutions [4], there is great interest in generating objective benchmarks of academic institutions. Traditionally, these benchmarks have focused on the inputs associated with each institution—amount of money raised, SAT scores of incoming students, number of grant dollars and research staff, etc—or on the reputation of those institutions as judged by their peers. Yet by their nature, academic institutions produce a great deal of output, usually in the form of the text of academic publications or dissertations. Such text-rich datasets tend to be overlooked in quantitative analysis of institutional performance because making effective, quantitative use of text is a challenging problem. In this study, we analyze these same institutions from a new perspective: scoring institutions by how much each institution looks like the future of academia, judged quantitatively from the text of each institution’s PhD dissertation abstracts.
[1]
Daniel Jurafsky,et al.
Studying the History of Ideas Using Topic Models
,
2008,
EMNLP.
[2]
Michael I. Jordan,et al.
Latent Dirichlet Allocation
,
2001,
J. Mach. Learn. Res..
[3]
Jeremiah P. Ostriker,et al.
A Data-Based Assessment of Research-Doctorate Programs in the United States
,
2011
.
[4]
Ramesh Nallapati,et al.
Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora
,
2009,
EMNLP.
[5]
Sean Gerrish,et al.
A Language-based Approach to Measuring Scholarly Impact
,
2010,
ICML.