Extracting and Querying Relations in Scientific Papers on Language Technology

We describe methods for extracting interesting factual relations from scientific texts in computational linguistics and language technology taken from the ACL Anthology. We use a hybrid NLP architecture with shallow preprocessing for increased robustness and domain-specific, ontology-based named entity recognition, followed by a deep HPSG parser running the English Resource Grammar (ERG). The extracted relations in the MRS (minimal recursion semantics) format are simplified and generalized using WordNet. The resulting “quriples” are stored in a database from where they can be retrieved (again using abstraction methods) by relation-based search. The query interface is embedded in a web browser-based application we call the Scientist’s Workbench. It supports researchers in editing and online-searching scientific papers.

[1]  Jun'ichi Tsujii,et al.  Syntactic Features for Protein-Protein Interaction Extraction , 2007, LBM.

[2]  Ulrich Schäfer OntoNERdIE - Mapping and Linking Ontologies to Named Entity Recognition and Information Extraction Resources , 2006, LREC.

[3]  Dan Flickinger,et al.  An Open Source Grammar Development Environment and Broad-coverage English Grammar Using HPSG , 2000, LREC.

[4]  Günter Neumann,et al.  A Multilingual Framework for Searching Definitions on Web Snippets , 2007, KI.

[5]  Ann Copestake,et al.  Integrating General-Purpose and Domain-Specific Components in the Analysis of Scientific Text , 2022 .

[6]  Christopher D. Manning,et al.  LinGO Redwoods A Rich and Dynamic Treebank for HPSG , 2002 .

[7]  Ulrich Schäfer,et al.  Shallow Processing with Unification and Typed Feature Structures - Foundations and Applications , 2004, Künstliche Intell..

[8]  Hans Uszkoreit,et al.  An Ontology-based Knowledge Portal for Language Technology , 2004 .

[9]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[10]  Dan Flickinger,et al.  Minimal Recursion Semantics: An Introduction , 2005 .

[11]  Ulrich Callmeier,et al.  PET – a platform for experimentation with efficient HPSG processing techniques , 2000, Natural Language Engineering.

[12]  Ulrich Schäfer,et al.  Integrating deep and shallow natural language processing components: representations and hybrid architectures , 2006 .

[13]  Dragomir R. Radev,et al.  The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics , 2008, LREC.