Semantic association ranking schemes for information retrieval applications using term association graph representation

Most of the Information Retrieval (IR) techniques are based on representing the documents using the traditional vector space and probabilistic language model i.e., bag-of- words model. In this paper, associations among words in the documents are assessed and it is expressed in Term Association Graph model to represent the document content and the relationship among the keywords. Earlier attempt on exploiting term association graph was done for non-personalized document re-ranking task. This paper experiments improved non-personalized and personalized re-ranking strategy which exploits term association graph data structure to assess the importance of a document for the user query and thus documents are re-ranked according to the association and similarity exists among the documents. This paper proposes various approaches under two models namely, Term Rank based Approach (TRA) and Path Traversal based Approaches (PTA1, PTA2, and PTA3). These approaches employ term association graph and has been evaluated using manually prepared real dataset and benchmark OHSUMED dataset. The results obtained are reasonably promising.

[1]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[2]  Alexander F. Gelbukh,et al.  Information Retrieval with Conceptual Graph Matching , 2000, DEXA.

[3]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[4]  Guido Zuccon,et al.  Graph-based concept weighting for medical information retrieval , 2012, ADCS.

[5]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[6]  Hua Li,et al.  Improving web search results using affinity graph , 2005, SIGIR '05.

[7]  G. J. Rodgers,et al.  Network properties of written human language. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Paul Van Dooren,et al.  A MEASURE OF SIMILARITY BETWEEN GRAPH VERTICES . WITH APPLICATIONS TO SYNONYM EXTRACTION AND WEB SEARCHING , 2002 .

[9]  Xuemin Lin,et al.  Term Graph Model for Text Classification , 2005, ADMA.

[10]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[11]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[12]  Paul Thompson Language Modeling for Information Retrieval edited by W. Bruce CroftJohn Lafferty , 2004, Comput. Linguistics.

[13]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[14]  M. Newman,et al.  Vertex similarity in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Christina Lioma,et al.  Graph-based term weighting for information retrieval , 2011, Information Retrieval.

[16]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR Forum.

[17]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[18]  K. Veningston,et al.  Information Retrieval by Document Re-ranking using Term Association Graph , 2014, ICONIAAC '14.

[19]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[20]  Ilango Krishnamurthi,et al.  Ranking semantic relationships between two entities using personalization in context specification , 2012, Inf. Sci..

[21]  Stan Szpakowicz,et al.  Learning Noun-Modifier Semantic Relations with Corpus-based and WordNet-based Features , 2006, AAAI.

[22]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[23]  Kwang Mong Sim,et al.  Toward agency and ontology for web-based information retrieval , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[24]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[25]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[26]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[27]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[28]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[29]  Aristides Gionis,et al.  The query-flow graph: model and applications , 2008, CIKM '08.

[30]  Farzin Maghoul,et al.  Query clustering using click-through graph , 2009, WWW '09.

[31]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[32]  M. H. Margahny,et al.  FAST ALGORITHM FOR MINING ASSOCIATION RULES , 2014 .

[33]  John D. Lafferty,et al.  Information Retrieval as Statistical Translation , 2017 .

[34]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[35]  John D. Lafferty,et al.  Document Language Models, Query Models, and Risk Minimization for Information Retrieval , 2001, SIGIR Forum.

[36]  Vijay V. Raghavan,et al.  Vector Space Model of Information Retrieval - A Reevaluation , 1984, SIGIR.

[37]  Christina Lioma,et al.  Random walk term weighting for information retrieval , 2007, SIGIR.

[38]  ZhaiChengxiang,et al.  Document Language Models, Query Models, and Risk Minimization for Information Retrieval , 2017 .

[39]  Gilad Mishne,et al.  Organizing query completions for web search , 2010, CIKM '10.

[40]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[41]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[42]  Michael R. Lyu,et al.  Mining Web Graphs for Recommendations , 2012, IEEE Transactions on Knowledge and Data Engineering.

[43]  Kenneth Wai-Ting Leung,et al.  Deriving Concept-Based User Profiles from Search Engine Logs , 2010, IEEE Transactions on Knowledge and Data Engineering.

[44]  Michalis Vazirgiannis,et al.  Usage-based PageRank for Web personalization , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[45]  Kenneth Ward Church,et al.  Query suggestion using hitting time , 2008, CIKM '08.

[46]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[47]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[48]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[49]  John Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR 1999.