论文信息 - PDFdigest: an Adaptable Layout-Aware PDF-to-XML Textual Content Extractor for Scientific Articles - 字舞流文

PDFdigest: an Adaptable Layout-Aware PDF-to-XML Textual Content Extractor for Scientific Articles

Comunicacio presentada a la Language Resources and Evaluation Conference (LREC) 2018, celebrada els dies 7 a 12 de maig de 2018 a Miyazaki, Japo.

Daniel Ferrés | Horacio Saggion | Àlex Bravo | Francesco Ronzano

[1] Daniel Ferrés,et al. Multi-level mining and visualization of scientific text collections: Exploring a bi-lingual scientific repository , 2017, WOSP@JCDL.

[2] Andrei Voronkov,et al. PDFX: fully-automated PDF-to-XML conversion of scientific literature , 2013, ACM Symposium on Document Engineering.

[3] Horacio Saggion,et al. Knowledge Extraction and Modeling from Scientific Publications , 2016 .

[4] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[5] Patrice Lopez,et al. GROBID: Combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications , 2009, ECDL.

[6] Akiko Aizawa,et al. SideNoter: Scholarly Paper Browsing System based on PDF Restructuring and Text Annotation , 2016, COLING.

[7] Lluís Padró,et al. FreeLing 3.0: Towards Wider Multilinguality , 2012, LREC.

[8] Min-Yen Kan,et al. Logical Structure Recovery in Scholarly Articles with Rich Document Features , 2010, Int. J. Digit. Libr. Syst..

[9] Dominika Tkaczyk,et al. CERMINE -- Automatic Extraction of Metadata and References from Scientific Literature , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.