Matchmaking through semantic annotation and similarity measurement

The proposed work briefly describes an approach to automatically extract structured information from semi-structured documents to match the document creators and users in order to find the best similarities between them and connect them for further collaborations. The general idea is to employ a semantic annotation technique and similarity measurement approach by using the ontology to find best matches between web documents. The proposed approach uses ontologies to annotate the extracted information and for the measuring the similarity between each pair of documents. GATE (General Architecture for Text Engineering) as one of the most famous annotation tools has been utilized to annotate semi-structure documents. A novel algorithm is proposed to update the supported ontology for extraction purpose in GATE by using a training data set. Furthermore, specific domain-based metrics are also utilized to measure semantic similarities between documents with regard to semantic annotations which are implemented in an ontology-based approach. These metrics can be used in order to find the most similar web documents among documents corpus.

[1]  Aldo Gangemi,et al.  Ontology Learning and Its Application to Automated Terminology Translation , 2003, IEEE Intell. Syst..

[2]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[3]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[4]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[5]  Yevgen Biletskiy,et al.  Information extraction from syllabi for academic e-Advising , 2009, Expert Syst. Appl..

[6]  Douglas E. Appelt,et al.  Introduction to Information Extraction Technology , 1999, IJCAI 1999.

[7]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[8]  Arnaud Sahuguet,et al.  Building intelligent Web applications using lightweight wrappers , 2001, Data Knowl. Eng..

[9]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[10]  Berthier A. Ribeiro-Neto,et al.  A brief survey of web data extraction tools , 2002, SGMD.

[11]  Dejing Dou,et al.  Ontology-based information extraction: An introduction and a survey of current approaches , 2010, J. Inf. Sci..

[12]  Michael D. Lee,et al.  An Empirical Evaluation of Models of Text Document Similarity , 2005 .

[13]  Deborah L. McGuinness,et al.  Ontologies Come of Age , 2003, Spinning the Semantic Web.

[14]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.