Semantic Annotation of Unstructured Documents Using Concepts Similarity

There is a large amount of information in the form of unstructured documents which pose challenges in the information storage, search, and retrieval. This situation has given rise to several information search approaches. Some proposals take into account the contextual meaning of the terms specified in the query. Semantic annotation technique can help to retrieve and extract information in unstructured documents. We propose a semantic annotation strategy for unstructured documents as part of a semantic search engine. In this proposal, ontologies are used to determine the context of the entities specified in the query. Our strategy for extracting the context is focused on concepts similarity. Each relevant term of the document is associated with an instance in the ontology. The similarity between each of the explicit relationships is measured through the combination of two types of associations: the association between each pair of concepts and the calculation of the weight of the relationships.

[1]  Marta Sabou,et al.  TourMISLOD: A tourism linked data set , 2013, Semantic Web.

[2]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[3]  Shang Gao,et al.  Delivering Categorized News Items Using RSS Feeds and Web Services , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[4]  Enrico Motta,et al.  Integration of micro-gravity and geodetic data to constrain shallow system mass changes at Krafla Volcano, N Iceland , 2006 .

[5]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[6]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[7]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[8]  Ladislav Hluchý,et al.  Ontea: Platform for Pattern Based Automated Semantic Annotation , 2009, Comput. Informatics.

[9]  Mirella Lapata,et al.  An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Rafael Berlanga Llavori,et al.  Exploiting semantic annotations for open information extraction: an experience in the biomedical domain , 2014, Knowledge and Information Systems.

[11]  Michael D. Lee,et al.  An Empirical Evaluation of Models of Text Document Similarity , 2005 .

[12]  Paolo Ferragina,et al.  Fast and Accurate Annotation of Short Texts with Wikipedia Pages , 2010, IEEE Software.

[13]  Enrico Motta,et al.  SemSearch: A Search Engine for the Semantic Web , 2006, EKAW.

[14]  Ian H. Witten,et al.  Learning a concept-based document similarity measure , 2012, J. Assoc. Inf. Sci. Technol..

[15]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[16]  Rada Mihalcea,et al.  Semantic Relatedness Using Salient Semantic Analysis , 2011, AAAI.

[17]  Rafael Valencia-García,et al.  Ontology learning from biomedical natural language documents using UMLS , 2011, Expert Syst. Appl..

[18]  Kevin Donnelly,et al.  SNOMED-CT: The advanced terminology and coding system for eHealth. , 2006, Studies in health technology and informatics.

[19]  Gerard Salton,et al.  Document Length Normalization , 1995, Inf. Process. Manag..

[20]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[21]  Escuela Politécnica Superior,et al.  Semantically enhanced Information Retrieval: an ontology-based approach , 2009 .

[22]  Kalina Bontcheva,et al.  Generalisation in named entity recognition: A quantitative analysis , 2017, Comput. Speech Lang..

[23]  Pablo Castells,et al.  An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval , 2007, IEEE Transactions on Knowledge and Data Engineering.

[24]  Rafael Berlanga Llavori,et al.  Semantic annotation of biomedical texts through concept retrieval , 2010, Proces. del Leng. Natural.

[25]  Simone Paolo Ponzetto,et al.  Knowledge-based graph document modeling , 2014, WSDM.

[26]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[27]  Johannes Keizer,et al.  The AGROVOC Linked Dataset , 2013, Semantic Web.

[28]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[29]  Tao Chen,et al.  Building semantic information search platform with extended Sesame framework , 2012, I-SEMANTICS '12.

[30]  Norberto Fernández García,et al.  Improving large-scale search engines with semantic annotations , 2013, Expert Syst. Appl..

[31]  Rafael Berlanga Llavori,et al.  Tailored semantic annotation for semantic search , 2015, J. Web Semant..

[32]  Robert P. Cook,et al.  Freebase: A Shared Database of Structured General Human Knowledge , 2007, AAAI.

[33]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[34]  Dietrich Rebholz-Schuhmann,et al.  Text processing through Web services: calling Whatizit , 2008, Bioinform..

[35]  Eneko Agirre,et al.  Random Walks for Knowledge-Based Word Sense Disambiguation , 2014, CL.

[36]  Cui Tao,et al.  Semantator: Semantic annotator for converting biomedical text to linked data , 2013, J. Biomed. Informatics.

[37]  Maria Teresa Pazienza,et al.  Semantic turkey: a browser-integrated environment for knowledge acquisition and management , 2012 .

[38]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .