Multilingual Content Extraction Extended with Background Knowledge for Military Intelligence

Abstract : Written information for military purposes is available in abundance. Documents are written in many languages. The question is how we can automate the content extraction of these documents. One possible approach is based on shallow parsing (information extraction) with application specific combination of analysis results. One example of this, the ZENON research system, does a partial content analysis of some English, Dari, and Tajik texts. Another principal approach for content extraction is based on a combination of deep and shallow parsing with logical inferences on the analysis results. In the project "Multilingual content analysis with semantic inference on military relevant texts" (mIE) we followed the second approach. In this paper, we present the results of the mIE project. First, we briefly contrast the ZENON project to the mIE project. In the main part of the paper, the mIE project is presented. After explaining the combined deep and shallow parsing approach with Head-driven Phrase Structured Grammars, the inference process is introduced. Then we show how background knowledge (WordNet, YAGO) is integrated into the logical inferences to increase the extent, quality, and accuracy of the content extraction. The prototype also is presented. The presentation includes briefing charts.

[1]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[2]  Harry Bunt,et al.  Semantic Underspecification: Which Technique For What Purpose? , 2008 .

[3]  Ulrich Schäfer,et al.  Shallow Processing with Unification and Typed Feature Structures - Foundations and Applications , 2004, Künstliche Intell..

[4]  Matthias Hecking,et al.  Analysis of Free-Form Battlefield Reports with Shallow Parsing Techniques , 2004 .

[5]  Johan Bos,et al.  Recognising Textual Entailment with Logical Inference , 2005, HLT.

[6]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[7]  Peter Clark,et al.  The Seventh PASCAL Recognizing Textual Entailment Challenge , 2011, TAC.

[8]  Dan Flickinger,et al.  Minimal Recursion Semantics: An Introduction , 2005 .

[9]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[10]  Anders Søgaard,et al.  Patrick Blackburn and Johan Bos, Representation and Inference for Natural Language , 2007, Stud Logica.

[11]  Gerhard Weikum,et al.  YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..

[12]  Johan Bos,et al.  Linguistically Motivated Large-Scale NLP with C&C and Boxer , 2007, ACL.

[13]  Matthias Hecking,et al.  Content Analysis of HUMINT Reports , 2006 .

[14]  Andreas Wotzlaw,et al.  Towards Better Ontological Support for Recognizing Textual Entailment , 2010, EKAW.

[15]  William McCune,et al.  Mace4 Reference Manual and Guide , 2003, ArXiv.

[16]  Jong-Bok Kim,et al.  Parsing Mixed Constructions in a Type Feature Structure Grammar , 2004, IJCNLP.

[17]  Ido Dagan,et al.  Recognizing textual entailment: Rational, evaluation and approaches , 2009 .

[18]  Thierry Poibeau,et al.  Multi-source, Multilingual Information Extraction and Summarization , 2012, Theory and Applications of Natural Language Processing.

[19]  Ulrich Callmeier,et al.  PET – a platform for experimentation with efficient HPSG processing techniques , 2000, Natural Language Engineering.

[20]  Stefan Thater,et al.  Efficient Solving and Exploration of Scope Ambiguities , 2005, ACL.

[21]  Elena Akhmatova,et al.  Textual Entailment Resolution via Atomic Propositions , 2005 .

[22]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[23]  Michael Wunder,et al.  Multilinguale Textinhaltserschließung auf militärischen Texten , 2009 .

[24]  Dan Flickinger,et al.  On building a more effcient grammar by exploiting types , 2000, Natural Language Engineering.

[25]  K. Markert,et al.  When logical inference helps determining textual entailment ( and when it doesn ’ t ) , .

[26]  Johan Bos Towards Wide-Coverage Semantic Interpretation , 2005 .

[27]  Montserrat Marimon Integrating Shallow Linguistic Processing into a Unification-based Spanish Grammar , 2002, COLING.

[28]  Douglas E. Appelt,et al.  Introduction to Information Extraction Technology , 1999, IJCAI 1999.

[29]  Emily M. Bender,et al.  Efficient Deep Processing of Japanese , 2002, ALR@COLING.

[30]  Matthias Hecking,et al.  “The Evolution of C2” A Tajik Extension of the Multilingual Information Extraction System ZENON , 2010 .

[31]  Ulrich Schäfer,et al.  Integrating deep and shallow natural language processing components: representations and hybrid architectures , 2006 .

[32]  Andreas Wotzlaw,et al.  Generation of first-order expressions from a broad coverage HPSG grammar , 2010, Proceedings of the International Multiconference on Computer Science and Information Technology.

[33]  D. Hecking,et al.  Navigation through the Meaning Space of HUMINT Reports , 2006 .

[34]  Emily M. Bender,et al.  The Grammar Matrix: An Open-Source Starter-Kit for the Rapid Development of Cross-linguistically Consistent Broad-Coverage Precision Grammars , 2002, COLING 2002.

[35]  Matthias Hecking,et al.  Multilingual Information Extraction for Intelligence Purposes , 2008 .

[36]  Ido Dagan,et al.  The Sixth PASCAL Recognizing Textual Entailment Challenge , 2009, TAC.

[37]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[38]  Michael J. Witbrock,et al.  An Introduction to the Syntax and Content of Cyc , 2006, AAAI Spring Symposium: Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Question Answering.

[39]  Yi Zhang,et al.  Construction of a German HPSG grammar from a detailed treebank , 2009 .

[40]  Andreas Wotzlaw,et al.  Recognizing Textual Entailment with Deep-Shallow Semantic Analysis and Logical Inference , 2010 .