Context Annotated Graph and Fuzzy Similarity Based Document Descriptor

The document descriptors are widely researched and developed in the NLP literature. This study deals with the task of designing context-aware document descriptors, which can be used for information retrieval (IR) based on generic queries. We propose a graph-based data structure that preserves contextual as well as structural information of the document. We also design a novel metric for calculating fuzzy similarity between two vectors. Three different types of contexts, collocational words co-occurring frequently with the keyword in the given paper (C1), collocational words co-occurring with the keyword in the defined corpus (C2), and words synonymous with the keywords in the WordNet lexical database (C3) were used. Each document in a corpus of research papers was represented with a context annotated graph (CAG). These graphs were vectorized, and similarity between paper vector and the query vector was calculated by using fuzzy similarity metric. We compare the results of generic search queries using average weighted precision (AWP) and discounted cumulative gain (DCG) performance metric. The descriptor with contextual information coupled with fuzzy similarity algorithm (AWP score of 0.915 and DCG score of 0.922) outperforms baseline of non-contextual, cosine similarity-based results (AWP score of 0.752 and DCG score of 0.796). We also concluded that contextual information gives better result with fuzzy similarity (AWP score of 0.915 and DCG score of 0.922) than with cosine similarity (AWP score of 0.865 and DCG score of 0.912).

[1]  Ming-Wei Chang,et al.  Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base , 2015, ACL.

[2]  Mark Last,et al.  Graph-Based Keyword Extraction for Single-Document Summarization , 2008, COLING 2008.

[3]  Julio Gonzalo,et al.  Automatic Selection of Noun Phrases as Document Descriptors in an FCA-Based Information Retrieval System , 2005, ICFCA.

[4]  Yves Schabes,et al.  Combining Trigram-based and Feature-based Methods for Context-Sensitive Spelling Correction , 1996, ACL.

[5]  Iraklis Varlamis,et al.  SemanticRank: Ranking Keywords and Sentences Using Semantic Graphs , 2010, COLING.

[6]  Andy Way,et al.  Exploiting source similarity for SMT using context-informed features , 2007, TMI.

[7]  M. Aref,et al.  Semantic graph reduction approach for abstractive Text Summarization , 2012, 2012 Seventh International Conference on Computer Engineering & Systems (ICCES).

[8]  Akshi Kumar,et al.  Systematic literature review on context-based sentiment analysis in social multimedia , 2019, Multimedia Tools and Applications.

[9]  Wendy W. Chapman,et al.  ConText: An Algorithm for Identifying Contextual Features from Clinical Text , 2007, BioNLP@ACL.

[10]  SeungJin Lim,et al.  A Graph Modeling of Semantic Similarity between Words , 2007 .

[11]  Yannis A. Tolias,et al.  Generalized fuzzy indices for similarity matching , 2001, Fuzzy Sets Syst..

[12]  Xinjian Chen,et al.  A new algorithm for distorted fingerprints matching based on normalized fuzzy similarity measure , 2006, IEEE Trans. Image Process..

[13]  Dan Roth,et al.  Scaling Up Context-Sensitive Text Correction , 2001, IAAI.

[14]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[15]  Marko Grobelnik,et al.  Learning Sub-structures of Document Semantic Graphs for Document Summarization , 2004 .

[16]  Francesco Ricci,et al.  Context-based splitting of item ratings in collaborative filtering , 2009, RecSys '09.

[17]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[18]  Daniel Kifer,et al.  Context-aware citation recommendation , 2010, WWW '10.

[19]  Akshi Kumar,et al.  Paradigm shifts: from pre-web information systems to recent web-based contextual information retrieval , 2010, Webology.

[20]  Hsuan-Shih Lee An optimal algorithm for computing the max-min transitive closure of a fuzzy similarity matrix , 2001, Fuzzy Sets Syst..

[21]  Min-Yen Kan,et al.  Re-examining Automatic Keyphrase Extraction Approaches in Scientific Articles , 2009, MWE@IJCNLP.

[22]  Alejandro Bellogín,et al.  On the robustness and discriminative power of information retrieval metrics for top-N recommendation , 2018, RecSys.

[23]  M. Cooper Collective Media Annotation using Undirected Random Field Models , 2007 .

[24]  Akshi Kumar,et al.  Contextual Proximity Based Term-Weighting for Improved Web Information Retrieval , 2007, KSEM.

[25]  Robin Burke,et al.  Context-aware music recommendation based on latenttopic sequential patterns , 2012, RecSys.

[26]  Evgeniy Gabrilovich,et al.  Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge , 2006, AAAI.

[27]  M. Hagenau,et al.  Automated News Reading: Stock Price Prediction Based on Financial News Using Context-Specific Features , 2012, 2012 45th Hawaii International Conference on System Sciences.

[28]  Mobyen Uddin Ahmed,et al.  A CASE‐BASED DECISION SUPPORT SYSTEM FOR INDIVIDUAL STRESS DIAGNOSIS USING FUZZY SIMILARITY MATCHING , 2009, Comput. Intell..

[29]  Marwan Torki,et al.  A Document Descriptor using Covariance of Word Vectors , 2018, ACL.

[30]  Alan F. Smeaton,et al.  Context-Aware Person Identification in Personal Photo Collections , 2009, IEEE Transactions on Multimedia.

[31]  Yue Lu,et al.  Exploiting social context for review quality prediction , 2010, WWW '10.

[32]  Kyuseok Shim,et al.  XTRACT: Learning Document Type Descriptors from XML Document Collections , 2004, Data Mining and Knowledge Discovery.

[33]  Nick Cramer,et al.  Automatic Keyword Extraction from Individual Documents , 2010 .

[34]  Yukio Ohsawa,et al.  KeyGraph: automatic indexing by co-occurrence graph based on building construction metaphor , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.