Extraction and Characterization of Citations in Scientific Papers

We propose a hybrid method for the extraction and characterization of citations in scientific papers using machine learning combined with rule-based approaches. Our protocol consists of the extraction of metadata, bibliography parsing, section titles processing, and find-grained semantic annotation on the sentence level of texts. This allows us to generate Linked Open Data from a set of research papers in XML.