Ontea: Platform for Pattern Based Automated Semantic Annotation

Automated annotation of web documents is a key challenge of the Se- mantic Web effort. Semantic metadata can be created manually or using automated annotation or tagging tools. Automated semantic annotation tools with best results are built on various machine learning algorithms which require training sets. Other approach is to use pattern based semantic annotation solutions built on natural language processing, information retrieval or information extraction methods. The paper presents Ontea platform for automated semantic annotation or semantic tag- ging. Implementation based on regular expression patterns is presented with eval- uation of results. Extensible architecture for integrating pattern based approaches is presented. Most of existing semi-automatic annotation solutions can not prove it real usage on large scale data such as web or email communication, but semantic web can be exploited only when computer understandable metadata will reach critical mass. Thus we also present approach to large scale pattern based annotation.

[1]  Atanas Kiryakov,et al.  Semantic Annotation, Indexing, and Retrieval , 2003, SEMWEB.

[2]  Ladislav Hluchý,et al.  Towards Large Scale Semantic Annotation Built on MapReduce Architecture , 2008, ICCS.

[3]  Viera Rozinajová,et al.  Methods and Tools for Acquiring and Presenting Information and Knowledge in the Web , 2005 .

[4]  Steffen Staab,et al.  Gimme' the context: context-driven automatic semantic annotation with C-PANKOW , 2005, WWW '05.

[5]  Ladislav Hluchý,et al.  Supporting Collaboration by Large Scale Email Analysis , 2008 .

[6]  Mária Bieliková,et al.  Rule-based User Characteristics Acquisition from Logs with Semantics for Personalized Web-Based Systems , 2009, Comput. Informatics.

[7]  Siegfried Handschuh,et al.  Semantic annotation for knowledge management: Requirements and a survey of the state of the art , 2006, J. Web Semant..

[8]  Mária Bieliková,et al.  Comparing Instances of Ontological Concepts for Personalized Recommendation in Large Information Spaces , 2009, Comput. Informatics.

[9]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[10]  Jun Ma,et al.  A Survey on Semantic E-Science Applications , 2008, Comput. Informatics.

[11]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[12]  David M. Pennock,et al.  Using web structure for classifying and describing web pages , 2002, WWW.

[13]  Steffen Staab,et al.  Authoring and annotation of web pages in CREAM , 2002, WWW.

[14]  Ladislav Hluchý Network Enterprise Interoperability and Collaboration using E-mail Communication , 2007 .

[15]  Hamish Cunningham,et al.  Information Extraction, Automatic , 2006 .

[16]  John Domingue,et al.  Magpie: supporting browsing and navigation on the semantic web , 2004, IUI '04.

[17]  Mária Bieliková,et al.  Social Navigation for Semantic Web Applications Using Space Maps , 2012, Comput. Informatics.

[18]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.

[19]  Óscar Corcho,et al.  Ontology based document annotation: trends and open research problems , 2006, Int. J. Metadata Semant. Ontologies.

[20]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[21]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[22]  Cyril W. Cleverdon,et al.  Factors determining the performance of indexing systems , 1966 .

[23]  Ramanathan V. Guha,et al.  A case for automated large-scale semantic annotation , 2003, J. Web Semant..

[24]  Eugene Charniak,et al.  Finding Parts in Very Large Corpora , 1999, ACL.

[25]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[26]  Ladislav Hluchý,et al.  Future Email Services and Applications , 2008, FIS.

[27]  Enrico Motta,et al.  Browsing for information by highlighting automatically generated annotations: a user study and evaluation , 2005, K-CAP '05.

[28]  Hyoil Han,et al.  Survey of semantic annotation platforms , 2005, SAC '05.

[29]  Mária Bieliková,et al.  Comparing Natural Language Identification Methods based on Markov Processes ? , 2007 .

[30]  Ladislav Hluchý,et al.  Empowering Automatic Semantic Annotation in Grid , 2007, PPAM.