Keyword and metadata extraction from pre-prints

In this paper we study how to provide metadata for a pre-print archive. Metadata includes, but is not limited to, title, authors, citations, and keywords, and is used to both present data to the user in a meaningful way, and to index and cross-reference the pre-prints. We are particularly interested in studying different methods to obtain metadata for a pre-print. We have developed a system that automatically extracts metadata, and that allows the user to verify and correct metadata before it is accepted by the system.

[1]  Henk L. Muller,et al.  Semi automated metadata extraction for preprints archives , 2008, JCDL '08.

[2]  Erik Duval,et al.  A Formal Model of Learning Object Metadata , 2006, EC-TEL.

[3]  Donna Bergmark Automatic Extraction of Reference Linking Information from Online Documents , 2000 .

[4]  Dominic Widdows,et al.  Geometry and Meaning , 2004, Computational Linguistics.

[5]  Hui Han,et al.  Name disambiguation in author citations using a K-way spectral clustering method , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[6]  C. Pollard,et al.  Center for the Study of Language and Information , 2022 .

[7]  Stevan Harnad,et al.  Integrating, Navigating and Analyzing Eprint Archives Through Open Citation Linking (the OpCit Project) , 2000 .

[8]  Atsuhiro Takasu,et al.  Bibliographic attribute extraction from erroneous references based on a statistical model , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[9]  Qinghua Zheng,et al.  Automatic extraction of titles from general documents using machine learning , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[10]  C. Lee Giles,et al.  Who gets acknowledged: Measuring scientific contributions through automatic acknowledgment indexing , 2004, Proc. Natl. Acad. Sci. USA.

[11]  Jihoon Yang,et al.  Knowledge-based metadata extraction from PostScript files , 2000, DL '00.

[12]  Erik Duval,et al.  Issues in Automatic Learning Object Indexation , 2002 .

[13]  David F. Brailsford,et al.  Document analysis of PDF files: methods, results and implications , 1995 .

[14]  Meredyth Daneman,et al.  The generation effect in reading and proofreading , 1993 .

[15]  Elizabeth D. Liddy,et al.  Breaking the metadata generation bottleneck: preliminary findings , 2001, JCDL '01.

[16]  Edward A. Fox,et al.  Automatic document metadata extraction using support vector machines , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[17]  Hope A. Olson,et al.  Indexing Consistency and its Implications for Information Architecture : A Pilot Study , 2006 .

[18]  Michael W. Berry,et al.  Understanding search engines: mathematical modeling and text retrieval (software , 1999 .

[19]  Jürgen Krause,et al.  Vocabulary Switching and Automatic Metadata Extraction or How to Get Useful Information from a Digital Library , 2000, DELOS.

[20]  Andy Powell,et al.  Guidelines for implementing Dublin Core in XML , 2003 .

[21]  Cheng Li,et al.  Two supervised learning approaches for name disambiguation in author citations , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[22]  Kun Bai,et al.  Automatic extraction of table metadata from digital documents , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[23]  Peter A. Flach,et al.  Predicting Topics of Scientific Papers from Co-Authorship Graphs: a Case Study , 2006 .

[24]  C. J. van Rijsbergen,et al.  The geometry of information retrieval , 2004 .

[25]  Jane Greenberg,et al.  Functionalities for automatic metadata generation applications: a survey of metadata experts' opinions , 2006, Int. J. Metadata Semant. Ontologies.

[26]  Mirko Luigi Aurelio Tavosanis,et al.  A Causal Classification of Orthography Errors in Web Texts , 2007 .

[27]  Steven Pemberton,et al.  XHTML™ Modularization 1.1 , 2008 .