论文信息 - Towards Ontology-based Information Extraction and Annotation of Paper Documents for Personalized Knowledge Acquisition

Towards Ontology-based Information Extraction and Annotation of Paper Documents for Personalized Knowledge Acquisition

Despite the advent of electronic personal information management (PIM) tools, knowledge workers are still heavily using paper-based information sources. But up to now, even in sophisticated tools for PIM such as the Semantic Desktop, the knowledge workers’ paper world is still neglected. Thus, electronic archiving of a web page for later reference is much easier than taking care of an interesting article in a magazine—whose copy might set dust on the user’s shelf and will long be forgotten when it would be helpful for a specific task. This paper presents how to use document analysis, ontology-based information extraction, and annotation techniques for personal knowledge acquisition in order to bridge the gap between the user’s paper world and his personal knowledge space in the Semantic Desktop. A recent prototype shows the feasibility of the approach.

[1] Marja-Riitta Koivunen,et al. Annotea: an open RDF infrastructure for shared Web annotations , 2001, WWW '01.

[2] Thomas M. Breuel,et al. Gestural Interaction for an Automatic Document Capture System , 2007 .

[3] Hyoil Han,et al. Survey of semantic annotation platforms , 2005, SAC '05.

[4] Claudia Wenzel,et al. An Approach to Context-driven Document Analysis and Understanding , 2000 .

[5] L. Sauermann,et al. ConTag : A Semantic Tag Recommendation System , 2007 .

[6] Kalina Bontcheva,et al. Evolving GATE to meet new challenges in language engineering , 2004, Natural Language Engineering.

[7] Thomas M. Breuel. The hOCR Microformat for OCR Workflow and Results , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[8] David W. Embley,et al. Ontology-based extraction and structuring of information from data-rich unstructured documents , 1998, CIKM '98.

[9] Leo Sauermann,et al. The Semantic Desktop as a foundation for PIM research , 2007 .

[10] Siegfried Handschuh,et al. Semantic annotation for knowledge management: Requirements and a survey of the state of the art , 2006, J. Web Semant..

[11] Georg Buscher,et al. Using Attention and Context Information for Annotations in a Semantic Wiki , 2008, SemWiki.

[12] Andreas Dengel,et al. Demo Abstract : Semantic Annotation of paper-based Information , 2007 .

[13] Andreas Dengel,et al. Believing Finite-State Cascades in Knowledge-Based Information Extraction , 2008, KI.

[14] L. Sauermann,et al. PIMO-a Framework for Representing Personal Information Models , 2007 .

[15] Steffen Staab,et al. CREAM: CREAting Metadata for the Semantic Web , 2003, Comput. Networks.

[16] Atanas Kiryakov,et al. KIM - Semantic Annotation Platform , 2003, SEMWEB.

[17] Abigail Sellen,et al. The myth of the paperless office , 2001 .

[18] Thomas M. Breuel,et al. The OCRopus open source OCR system , 2008, Electronic Imaging.

[19] Malte Kiesel. Kaukolu: Hub of the Semantic Corporate Intranet , 2006, SemWiki.

[20] Ansgar Bernardi,et al. Overview and Outlook on the Semantic Desktop , 2005, Semantic Desktop Workshop.

[21] Jean-Luc Minel,et al. Document annotation and ontology population from linguistic extractions , 2005, K-CAP '05.

[22] Douglas E. Appelt,et al. Introduction to Information Extraction Technology , 1999, IJCAI 1999.

[23] Ansgar Bernardi,et al. Leveraging Passive Paper Piles to Active Objects in Personal Knowledge Spaces , 2005, Wissensmanagement.

[24] Harald Holz,et al. From Lightweight, Proactive Information Delivery to Business Process-Oriented Knowledge Management , 2005 .

[25] Kazem Taghva,et al. The Effects of OCR Error on the Extraction of Private Information , 2006, Document Analysis Systems.

[26] Andreas R. Dengel,et al. Six Thousand Words about Multi-Perspective Personal Document Management , 2006, 2006 10th IEEE International Enterprise Distributed Object Computing Conference Workshops (EDOCW'06).