Multi-Objective GP Strategies for Topical Search Integrating Wikipedia Concepts

Genetic Programming techniques have demonstrated great potential in dealing with the problem of query generation. This work explores different Multi-Objective Genetic Programming strategies for evolving a collection of topic-based Boolean queries. It compares three approaches to build topical Boolean queries: using terms, incorporating Wikipedia semantics (Wikipedia concepts) and a hybrid approach, using a combination of both terms and concepts. In addition, different fitness functions are combined giving rise to seven multi-objective schemes. In particular, we investigate the use of the proposed strategies in conjunction with novel fitness functions aimed at attaining high diversity based on the information-theoretic notion of entropy and Jaccard similarity. Experiments were completed using 25 topics from a dataset consisting of approximately 350,000 webpages classified into 448 topics. The results reveal that the use of Wikipedia concepts does not result in statistically significant improvements in precision, global recall or diversity when compared to the term-based approaches. However, the use of concepts has a positive effect on query interpretability since the use of terms leads to artificial queries that are hard to interpret by humans. In the meantime, concept-based queries contain a smaller number of operands than the term-based ones, hence resulting in better execution times without a loss in retrieval performance.

[1]  Francisco Herrera,et al.  A study of the use of multi-objective evolutionary algorithms to learn Boolean queries: A comparative study , 2009 .

[2]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[3]  Ankur Sinha,et al.  Automated query learning with Wikipedia and genetic programming , 2010, Artif. Intell..

[4]  Feng Xia,et al.  Context-Based Collaborative Filtering for Citation Recommendation , 2015, IEEE Access.

[5]  Rada Mihalcea,et al.  Semantic Relatedness Using Salient Semantic Analysis , 2011, AAAI.

[6]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[7]  Kalyanmoy Deb,et al.  Multi-objective optimization using evolutionary algorithms , 2001, Wiley-Interscience series in systems and optimization.

[8]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[9]  Ana Gabriela Maguitman,et al.  Multiobjective evolutionary algorithms for context-based search , 2010, J. Assoc. Inf. Sci. Technol..

[10]  Evangelos E. Milios,et al.  Active High-Recall Information Retrieval from Domain-Specific Text Corpora based on Query Documents , 2018, DocEng.

[11]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[12]  Hao Hu,et al.  Diversifying Query Suggestions by Using Topics from Wikipedia , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[13]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[14]  Enrique Herrera-Viedma,et al.  Improving the learning of Boolean queries by means of a multiobjective IQBE evolutionary algorithm , 2006, Inf. Process. Manag..

[15]  Lawrence Birnbaum,et al.  Information access in context , 2001, Knowl. Based Syst..

[16]  Sean Luke,et al.  Fighting Bloat with Nonparametric Parsimony Pressure , 2002, PPSN.

[17]  Ana Gabriela Maguitman,et al.  A semi-supervised incremental algorithm to automatically formulate topical queries , 2009, Inf. Sci..

[18]  Xiaohua Hu,et al.  Exploiting Wikipedia as external knowledge for document clustering , 2009, KDD.

[19]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[20]  Thad Starner,et al.  Remembrance Agent: A Continuously Running Automated Information Retrieval System , 1996, PAAM.

[21]  Euripides G. M. Petrakis,et al.  Semantic similarity methods in wordNet and their application to information retrieval on the web , 2005, WIDM '05.

[22]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[23]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems , 2002, Genetic Algorithms and Evolutionary Computation.

[24]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[25]  Ana Gabriela Maguitman,et al.  Using genetic algorithms to evolve a population of topical queries , 2008, Inf. Process. Manag..

[26]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[27]  Ana Gabriela Maguitman,et al.  Exploiting Rich Context: An Incremental Approach to Context-Based Web Search , 2005, CONTEXT.

[28]  Paolo Ferragina,et al.  Fast and Accurate Annotation of Short Texts with Wikipedia Pages , 2010, IEEE Software.

[29]  A. Bennett The Origin of Species by means of Natural Selection; or the Preservation of Favoured Races in the Struggle for Life , 1872, Nature.

[30]  Oscar Cordón,et al.  Evolutionary Learning of Boolean Queries by Multiobjective Genetic Programming , 2002, PPSN.

[31]  Evangelos E. Milios,et al.  An ensemble approach for text document clustering using Wikipedia concepts , 2014, DocEng '14.

[32]  Changqin Quan,et al.  Exploiting salient semantic analysis for information retrieval , 2016, Enterp. Inf. Syst..

[33]  Ana Gabriela Maguitman,et al.  Topic relevance and diversity in information retrieval from large datasets: A multi-objective evolutionary algorithm approach , 2017, Appl. Soft Comput..

[34]  Jiawei Han,et al.  Heterogeneous graph-based intent learning with queries, web pages and Wikipedia concepts , 2014, WSDM.

[35]  Ana Gabriela Maguitman,et al.  An Entropy-Based Approach for Preserving Diversity in Evolutionary Topical Search , 2016 .

[36]  Evangelos E. Milios,et al.  Vector Embedding of Wikipedia Concepts and Entities , 2017, NLDB.

[37]  Clement T. Yu,et al.  An effective approach to document retrieval via utilizing WordNet and recognizing phrases , 2004, SIGIR '04.

[38]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[39]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.