Automatic Document Annotation with Data Mining Algorithms

By combining both semantically annotated documents and semantically annotated services, it is possible for digital solutions to automatically retrieve and assign documents not only to their own services but also to those provided by others, thus improving and optimizing the experience of its users. Most of the information exchanged in and between services is still either in paper form or over email and is mostly unstructured and in lack of any form of annotation. Manual and semi-automatic approaches are not suitable to deal with the huge amounts of heterogeneous and constantly flowing data existent in this scenario, thus raising the issue of automatic annotation. In this paper, three data mining algorithms are used to annotate a set of documents and their results compared to manually provided annotations.

[1]  William John Teahan,et al.  Text classification and segmentation using minimum cross-entropy , 2000, RIAO.

[2]  Dejing Dou,et al.  Ontology-based information extraction: An introduction and a survey of current approaches , 2010, J. Inf. Sci..

[3]  Ana Bertha Ríos Alvarado,et al.  The acquisition of axioms for ontology learning using named entities , 2016 .

[4]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[5]  Siegfried Handschuh,et al.  Semantic annotation for knowledge management: Requirements and a survey of the state of the art , 2006, J. Web Semant..

[6]  Fionn Murtagh,et al.  Algorithms for hierarchical clustering: an overview , 2012, WIREs Data Mining Knowl. Discov..

[7]  João Rocha,et al.  Semantic annotation tools survey , 2013, 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[8]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[9]  Hao Wang,et al.  Semantic data mining: A survey of ontology-based approaches , 2015, Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015).

[10]  Krys J. Kochut,et al.  Ontology-Based Text Classification into Dynamically Defined Topics , 2014, 2014 IEEE International Conference on Semantic Computing.

[11]  Alicia Martínez Rebollar,et al.  Semantic Annotation of Unstructured Documents Using Concepts Similarity , 2017, Sci. Program..

[12]  Óscar Corcho,et al.  Ontology based document annotation: trends and open research problems , 2006, Int. J. Metadata Semant. Ontologies.

[13]  Tran Cao Son,et al.  Semantic Web Services , 2001, IEEE Intell. Syst..

[14]  Driss Mammass,et al.  Semantic Annotation of Documents: A Comparative Study , 2016 .

[15]  Johanna Völker,et al.  Towards large-scale, open-domain and ontology-based named entity classification , 2005 .