A Hybrid Approach for Measuring Semantic Similarity between Documents and its Application in Mining the Knowledge Repositories

This paper explains about similarity measure and the relationship between the knowledge repositories. This paper also describes the significance of document similarity measures, algorithms and to which type of text it can be applied Document similarity measures are of full text similarity, paragraph similarity, sentence similarity, semantic similarity, structural similarity and statistical measures. Two different frameworks had been proposed in this paper, one for measuring document to document similarity and the other model which measures similarity between documents to multiple documents. These two proposed models can use any one of the similarity measures in implementation aspect, which is been put forth for further research.

[1]  Chris Mellish,et al.  Combining information extraction with genetic algorithms for text mining , 2004, IEEE Intelligent Systems.

[2]  Jie Liu,et al.  Concept Name Similarity Calculation Based on WordNet and Ontology , 2013, J. Softw..

[3]  Justin Zobel,et al.  Methods for Identifying Versioned and Plagiarized Documents , 2003, J. Assoc. Inf. Sci. Technol..

[4]  Jinwoo Park,et al.  Improving text categorization using the importance of sentences , 2004, Inf. Process. Manag..

[5]  Fang Wu,et al.  A New Measure of Word Semantic Similarity Based on WordNet Hierarchy and DAG Theory , 2009, 2009 International Conference on Web Information Systems and Mining.

[6]  Douglas L. T. Rohde,et al.  Methods for Binary Multidimensional Scaling , 2002, Neural Computation.

[7]  Berthier A. Ribeiro-Neto,et al.  Image retrieval using multiple evidence ranking , 2004, IEEE Transactions on Knowledge and Data Engineering.

[8]  Ying Liu,et al.  Example-based Chinese-English MT , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[9]  Michael D. Lee,et al.  An Empirical Evaluation of Models of Text Document Similarity , 2005 .

[10]  Gao Cong,et al.  Semantic similarity based on compact concept ontology , 2008, WWW.

[11]  Dong-Yul Ra,et al.  Techniques for improving web retrieval effectiveness , 2005, Inf. Process. Manag..

[12]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[13]  A. Tversky Features of Similarity , 1977 .

[14]  Gang Liu,et al.  A WordNet-based Semantic Similarity Measure Enhanced by Internet-based Knowledge , 2011, SEKE.

[15]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[16]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[17]  Roger N. Shepard,et al.  Additive clustering: Representation of similarities as combinations of discrete overlapping properties. , 1979 .

[18]  May Sabai Han Semantic Information Retrieval based on Wikipedia Taxonomy , 2012 .

[19]  Jaime Delgado,et al.  A Vector Space Model for Semantic Similarity Calculation and OWL Ontology Alignment , 2006, DEXA.

[20]  W. Bruce Croft,et al.  Similarity measures for tracking information flow , 2005, CIKM '05.

[21]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[22]  Masrah Azrifah Azmi Murad,et al.  Word Sense Disambiguation-based Sentence Similarity , 2010, COLING.

[23]  David Sánchez,et al.  Ontology-based semantic similarity: A new feature-based approach , 2012, Expert Syst. Appl..

[24]  Rada Mihalcea,et al.  A WordNet-Based Interface to Internet Search Engines , 1998, FLAIRS.

[25]  James Allan,et al.  Retrieval and novelty detection at the sentence level , 2003, SIGIR.

[26]  Vladimir A. Oleshchuk,et al.  Ontology based semantic similarity comparison of documents , 2003, 14th International Workshop on Database and Expert Systems Applications, 2003. Proceedings..

[27]  Simonetta Montemagni,et al.  Combining Statistical Techniques and Lexico-syntactic Patterns for Semantic Relations Extraction from Text , 2008, SWAP.

[28]  Peter D. Turney Measuring Semantic Similarity by Latent Relational Analysis , 2005, IJCAI.

[29]  Adel Elsayed,et al.  A Corpus-based Evaluation of a Domain-specific Text to Knowledge Mapping Prototype , 2012, J. Comput..

[30]  Donald Michie,et al.  Return of the Imitation Game , 2001, Electron. Trans. Artif. Intell..

[31]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[32]  Gina-Anne Levow,et al.  Term representation with Generalized Latent Semantic Analysis , 2007 .