Automatic acknowledgement indexing: expanding the semantics of contribution in the CiteSeer digital library

Acknowledgements in research publications, like citations, indicate influential contributions to scientific work; however, large-scale acknowledgement analyses have traditionally been impractical due to the high cost of manual information extraction. In this paper we describe a mixture method for automatically mining acknowledgements from research documents using a combination of a Support Vector Machine and regular expressions. The algorithm has been implemented as a plug-in to the CiteSeer Digital Library and the extraction results have been integrated with the traditional metadata and citation index of the CiteSeer system. As a demonstration, we use CiteSeer's autonomous citation indexing (ACI) feature to measure the relative impact of acknowledged entities, and present the top twenty acknowledged entities within the archive.

[1]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[2]  D. Edge Quantitative Measures of Communication in Science: A Critical Review , 1979, History of science; an annual review of literature, research and teaching.

[3]  Blaise Cronin,et al.  Accounting for Influence: Acknowledgments in Contemporary Sociology , 1993, J. Am. Soc. Inf. Sci..

[4]  Yuji Matsumoto,et al.  Use of Support Vector Learning for Chunk Identification , 2000, CoNLL/LLL.

[5]  Blaise Cronin,et al.  Who dunnit? Metatags and hyperauthorship , 2001, J. Assoc. Inf. Sci. Technol..

[6]  Marc Moens,et al.  Description of the LTG System Used for MUC-7 , 1998, MUC.

[7]  Hwee Tou Ng,et al.  A maximum entropy approach to information extraction from semi-structured and free text , 2002, AAAI/IAAI.

[8]  K. McCain Communication, Competition, and Secrecy: The Production and Dissemination of Research-Related Information in Genetics , 1991 .

[9]  Thorsten Joachims,et al.  A statistical learning learning model of text classification for support vector machines , 2001, SIGIR '01.

[10]  Peter Clark,et al.  Representing roles and purpose , 2001, K-CAP '01.

[11]  Shi Bing,et al.  Inductive learning algorithms and representations for text categorization , 2006 .

[12]  S. Redner How popular is your paper? An empirical study of the citation distribution , 1998, cond-mat/9804163.

[13]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[14]  Edward A. Fox,et al.  Automatic document metadata extraction using support vector machines , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[15]  C. Lee Giles,et al.  Who gets acknowledged: Measuring scientific contributions through automatic acknowledgment indexing , 2004, Proc. Natl. Acad. Sci. USA.

[16]  Robert D. Cameron,et al.  A Universal Citation Database as a Catalyst for Reform in Scholarly Communication , 1997, First Monday.

[17]  James Mayfield,et al.  Entity Extraction without Language-Specific Resources , 2002, CoNLL.

[18]  Roni Rosenfeld,et al.  Learning Hidden Markov Model Structure for Information Extraction , 1999 .

[19]  C. Lee Giles,et al.  Digital Libraries and Autonomous Citation Indexing , 1999, Computer.

[20]  Charles H. Davis,et al.  Acknowledgments and Intellectual Indebtedness: A Bibliometric Conjecture , 1993, J. Am. Soc. Inf. Sci..

[21]  Nigel Collier,et al.  Use of Support Vector Machines in Extended Named Entity Recognition , 2002, CoNLL.

[22]  Debora Shaw,et al.  A cast of thousands: Coauthorship and subauthorship collaboration in the 20th century as manifested in the scholarly journal literature of psychology and philosophy , 2003, J. Assoc. Inf. Sci. Technol..