Architecture of a concept-based information retrieval system for educational resources

In the literature, the bag-of-concepts representation of textual documents is regarded as a convenient alternative to the bag-of-words representation, since the words that users choose as search terms may differ from the ones that the author of a particular document chose for referring to the same concept-thus reducing recall. Besides, the bag-of-words representation does not detect the differences of context in ambiguous terms, what reduces precision in search results. The objective of our research is to evaluate the applicability of the bag-of-concepts paradigm to information retrieval of educational resources. We built an information retrieval system that follows that approach and evaluated it with final users. The main contribution of this paper is the description of the architecture of the information retrieval system. First evaluation results show that the information retrieval system based on bag-of-concepts works well for retrieving educational resources. The practical implications of this research are that: it demonstrates that it is workable to build information retrieval systems based on bag-of-concepts and that they are efficient for retrieving educational resources. This makes them an a priori interesting alternative to be applied in other domains.

[1]  Ian H. Witten,et al.  An open-source toolkit for mining Wikipedia , 2013, Artif. Intell..

[2]  Daniela Giordano,et al.  Linked education: interlinking educational resources and the Web of data , 2012, SAC '12.

[3]  Josef Froschauer,et al.  Learning about Art History by Exploratory Search, Contextual View and Social Tags , 2012, 2012 IEEE 12th International Conference on Advanced Learning Technologies.

[4]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[5]  Stephen Travis Pope,et al.  A Description of the Model-View-Controller User Interface Paradigm in the Smalltalk-80 System , 1998 .

[6]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[7]  Oscar Täckström,et al.  An Evaluation of Bag-of-Concepts Representations in Automatic Text Classification , 2005 .

[8]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Susan T. Dumais,et al.  Using latent semantic analysis to improve information retrieval , 1988, CHI 1988.

[11]  Katja Niemann Increasing the accessibility of learning objects by automatic tagging , 2015, LAK.

[12]  F. Maiorana,et al.  Feeding back learning resources repurposing patterns into the “information loop”: opportunities and challenges , 2009, 2009 9th International Conference on Information Technology and Applications in Biomedicine.

[13]  Vahid Jalali,et al.  Information retrieval with concept-based pseudo-relevance feedback in MEDLINE , 2011, Knowledge and Information Systems.

[14]  Avare Stewart,et al.  Unsupervised Auto-tagging for Learning Object Enrichment , 2011, EC-TEL.

[15]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[16]  James Allan,et al.  A Comparative Study of Utilizing Topic Models for Information Retrieval , 2009, ECIR.

[17]  J. Giles Internet encyclopaedias go head to head , 2005, Nature.

[18]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[19]  Erik Duval,et al.  Resources Beyond Content for Open Education , 2015 .

[20]  Ramanathan V. Guha,et al.  Semantic search , 2003, WWW '03.

[21]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[22]  Jyrki Wallenius,et al.  Concept-based document classification using Wikipedia and value function , 2011, J. Assoc. Inf. Sci. Technol..

[23]  Evgeniy Gabrilovich,et al.  Concept-Based Information Retrieval Using Explicit Semantic Analysis , 2011, TOIS.

[24]  Evgeniy Gabrilovich,et al.  Feature generation for textual information retrieval using world knowledge , 2007, SIGF.

[25]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[26]  Magnus Sahlgren,et al.  The Distributional Hypothesis , 2008 .

[27]  Ian H. Witten,et al.  A knowledge-based search engine powered by wikipedia , 2007, CIKM '07.

[28]  Ian H. Witten,et al.  Topic indexing with Wikipedia , 2008 .

[29]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[30]  Martin Wattenberg,et al.  Studying cooperation and conflict between authors with history flow visualizations , 2004, CHI.

[31]  Pasquale Lops,et al.  Content-based Recommender Systems: State of the Art and Trends , 2011, Recommender Systems Handbook.

[32]  Gary Marchionini,et al.  Exploratory search , 2006, Commun. ACM.

[33]  Doreen Böhnstedt,et al.  Extended Explicit Semantic Analysis for Calculating Semantic Relatedness of Web Resources , 2010, EC-TEL.

[34]  Matthias Jarke,et al.  An Interactive System for Visual Analytics of Dynamic Topic Models , 2013, Datenbank-Spektrum.

[35]  S. T. Dumais,et al.  Using latent semantic analysis to improve access to textual information , 1988, CHI '88.

[36]  Evgeniy Gabrilovich,et al.  Concept-Based Feature Generation and Selection for Information Retrieval , 2008, AAAI.

[37]  Hyoil Han,et al.  Survey of semantic annotation platforms , 2005, SAC '05.

[38]  Massimiliano Ciaramita,et al.  A framework for benchmarking entity-annotation systems , 2013, WWW.

[39]  Iryna Gurevych,et al.  Using Wikipedia and Wiktionary in Domain-Specific Information Retrieval , 2008, CLEF.

[40]  Stephen Travis Pope,et al.  A cookbook for using the model-view controller user interface paradigm in Smalltalk-80 , 1988 .

[41]  Samir Chatterjee,et al.  A Design Science Research Methodology for Information Systems Research , 2008 .