论文信息 - Keyword and metadata extraction from pre-prints

Keyword and metadata extraction from pre-prints

In this paper we study how to provide metadata for a pre-print archive. Metadata includes, but is not limited to, title, authors, citations, and keywords, and is used to both present data to the user in a meaningful way, and to index and cross-reference the pre-prints. We are particularly interested in studying different methods to obtain metadata for a pre-print. We have developed a system that automatically extracts metadata, and that allows the user to verify and correct metadata before it is accepted by the system.

Henk L. Muller | Emma Tonkin | H. Muller | E. Tonkin

[1] Henk L. Muller,et al. Semi automated metadata extraction for preprints archives , 2008, JCDL '08.

[2] Erik Duval,et al. A Formal Model of Learning Object Metadata , 2006, EC-TEL.

[3] Donna Bergmark. Automatic Extraction of Reference Linking Information from Online Documents , 2000 .

[4] Dominic Widdows,et al. Geometry and Meaning , 2004, Computational Linguistics.

[5] Hui Han,et al. Name disambiguation in author citations using a K-way spectral clustering method , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[6] C. Pollard,et al. Center for the Study of Language and Information , 2022 .

[7] Stevan Harnad,et al. Integrating, Navigating and Analyzing Eprint Archives Through Open Citation Linking (the OpCit Project) , 2000 .

[8] Atsuhiro Takasu,et al. Bibliographic attribute extraction from erroneous references based on a statistical model , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[9] Qinghua Zheng,et al. Automatic extraction of titles from general documents using machine learning , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[10] C. Lee Giles,et al. Who gets acknowledged: Measuring scientific contributions through automatic acknowledgment indexing , 2004, Proc. Natl. Acad. Sci. USA.

[11] Jihoon Yang,et al. Knowledge-based metadata extraction from PostScript files , 2000, DL '00.

[12] Erik Duval,et al. Issues in Automatic Learning Object Indexation , 2002 .

[13] David F. Brailsford,et al. Document analysis of PDF files: methods, results and implications , 1995 .

[14] Meredyth Daneman,et al. The generation effect in reading and proofreading , 1993 .

[15] Elizabeth D. Liddy,et al. Breaking the metadata generation bottleneck: preliminary findings , 2001, JCDL '01.

[16] Edward A. Fox,et al. Automatic document metadata extraction using support vector machines , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[17] Hope A. Olson,et al. Indexing Consistency and its Implications for Information Architecture : A Pilot Study , 2006 .

[18] Michael W. Berry,et al. Understanding search engines: mathematical modeling and text retrieval (software , 1999 .

[19] Jürgen Krause,et al. Vocabulary Switching and Automatic Metadata Extraction or How to Get Useful Information from a Digital Library , 2000, DELOS.

[20] Andy Powell,et al. Guidelines for implementing Dublin Core in XML , 2003 .

[21] Cheng Li,et al. Two supervised learning approaches for name disambiguation in author citations , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[22] Kun Bai,et al. Automatic extraction of table metadata from digital documents , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[23] Peter A. Flach,et al. Predicting Topics of Scientific Papers from Co-Authorship Graphs: a Case Study , 2006 .

[24] C. J. van Rijsbergen,et al. The geometry of information retrieval , 2004 .

[25] Jane Greenberg,et al. Functionalities for automatic metadata generation applications: a survey of metadata experts' opinions , 2006, Int. J. Metadata Semant. Ontologies.

[26] Mirko Luigi Aurelio Tavosanis,et al. A Causal Classification of Orthography Errors in Web Texts , 2007 .

[27] Steven Pemberton,et al. XHTML™ Modularization 1.1 , 2008 .