AckSeer: a repository and search engine for automatically extracted acknowledgments from digital libraries

Acknowledgments are widely used in scientific articles to express gratitude and credit collaborators. Despite suggestions that indexing acknowledgments automatically will give interesting insights, there is currently, to the best of our knowledge, no such system to track acknowledgments and index them. In this paper we introduce AckSeer, a search engine and a repository for automatically extracted acknowledgments in digital libraries. AckSeer is a fully automated system that scans items in digital libraries including conference papers, journals, and books extracting acknowledgment sections and identifying acknowledged entities mentioned within. We describe the architecture of AckSeer and discuss the extraction algorithms that achieve a F1 measure above 83%. We use multiple Named Entity Recognition (NER) tools and propose a method for merging the outcome from different recognizers. The resulting entities are stored in a database then made searchable by adding them to the AckSeer index along with the metadata of the containing paper/book. We build AckSeer on top of the documents in CiteSeerx digital library yielding more than 500,000 acknowledgments and more than 4 million mentioned entities.

[1]  Madian Khabsa,et al.  Towards Building and Analyzing a Social Network of Acknowledgments in Scientific and Academic Documents , 2012, SBP.

[2]  C. Lee Giles,et al.  SEERLAB: A System for Extracting Keyphrases from Scholarly Documents , 2010, SemEval@ACL.

[3]  C. Lee Giles,et al.  Disambiguating authors in academic publications using random forests , 2009, JCDL '09.

[4]  R G Hart,et al.  On authorship and acknowledgments. , 1992, The New England journal of medicine.

[5]  Kun Bai,et al.  TableRank: A Ranking Algorithm for Table Search and Retrieval , 2007, AAAI.

[6]  Hideki Isozaki,et al.  Efficient Support Vector Classifiers for Named Entity Recognition , 2002, COLING.

[7]  Charles H. Davis,et al.  Acknowledgments and Intellectual Indebtedness: A Bibliometric Conjecture , 1993, J. Am. Soc. Inf. Sci..

[8]  Laurie Scrivener An Exploratory Analysis of History Students’ Dissertation Acknowledgments , 2009 .

[9]  Nigel Collier,et al.  Use of Support Vector Machines in Extended Named Entity Recognition , 2002, CoNLL.

[10]  Prasenjit Mitra,et al.  An algorithm search engine for software developers , 2011, SUITE '11.

[11]  Debora Shaw,et al.  A cast of thousands: Coauthorship and subauthorship collaboration in the 20th century as manifested in the scholarly journal literature of psychology and philosophy , 2003, J. Assoc. Inf. Sci. Technol..

[12]  Hwee Tou Ng,et al.  A maximum entropy approach to information extraction from semi-structured and free text , 2002, AAAI/IAAI.

[13]  Kun Bai,et al.  TableSeer: automatic table metadata extraction and searching in digital libraries , 2007, JCDL '07.

[14]  Dongwon Lee,et al.  Search engine driven author disambiguation , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[15]  Edward A. Fox,et al.  Automatic document metadata extraction using support vector machines , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[16]  Blaise Cronin,et al.  Acknowledgement trends in the research literature of information science , 2001, J. Documentation.

[17]  References , 1971 .

[18]  Hui Han,et al.  Automatic acknowledgement indexing: expanding the semantics of contribution in the CiteSeer digital library , 2005, K-CAP '05.

[19]  Yuji Matsumoto,et al.  Use of Support Vector Learning for Chunk Identification , 2000, CoNLL/LLL.

[20]  Jian Su,et al.  Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.

[21]  James Ze Wang,et al.  Automatic Extraction of Data from 2-D Plots in Documents , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[22]  C. Lee Giles,et al.  Who gets acknowledged: Measuring scientific contributions through automatic acknowledgment indexing , 2004, Proc. Natl. Acad. Sci. USA.

[23]  Burr Settles,et al.  Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[24]  D. Edge Quantitative Measures of Communication in Science: A Critical Review , 1979, History of science; an annual review of literature, research and teaching.

[25]  C. Lee Giles,et al.  Finding algorithms in scientific articles , 2010, WWW '10.

[26]  Blaise Cronin,et al.  The Norms of acknowledgement in Four Humanities and Social Sciences disciplines , 1993, J. Documentation.

[27]  C. Lee Giles,et al.  Automatic Extraction of Data Points and Text Blocks from 2-Dimensional Plots in Digital Documents , 2008, AAAI.

[28]  Blaise Cronin,et al.  Accounting for Influence: Acknowledgments in Contemporary Sociology , 1993, J. Am. Soc. Inf. Sci..

[29]  C. Lee Giles,et al.  CiteSeer: an automatic citation indexing system , 1998, DL '98.

[30]  Blaise Cronin,et al.  Patterns of acknowledgement , 1992, J. Documentation.