User centered and ontology based information retrieval system for life sciences

BackgroundBecause of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations.ResultsThis paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway.ConclusionsThe ontology based information retrieval system described in this paper (OBIRS) is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens relevant information to provide decision help.

[1]  Najafi Azadeh,et al.  REAL LIFE, REAL USERS AND REAL NEEDS: A STUDY AND ANALYSIS OF USER QUERIES ON THE WEB , 2008 .

[2]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[3]  Mohand Boughanem,et al.  A fuzzy set approach to concept-based information retrieval , 2005, EUSFLAT Conf..

[4]  C. Benito Annual Review of Information Science and Technology (ARIST) , 2003 .

[5]  Linda Schamber Relevance and Information Behavior. , 1994 .

[6]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[7]  Samuel Kaski,et al.  An information retrieval perspective on visualization of gene expression data with ontological annotation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Mohand Boughanem,et al.  Query Modification Based on Relevance Back-Propagation in an Ad hoc Environment , 1999, Inf. Process. Manag..

[9]  Mark R. Nelson,et al.  We have the information you want, but getting it will cost you!: held hostage by information overload. , 1994, CROS.

[10]  C. J. van Rijsbergen,et al.  Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval , 1987, SIGIR 1987.

[11]  Frederick P. Roth,et al.  GoFish finds genes with combinations of Gene Ontology attributes , 2003, Bioinform..

[12]  Troels Andreasen,et al.  Ontology-Based Querying , 2000, FQAS.

[13]  Carolyn J. Crouch,et al.  Experiments in automatic statistical thesaurus construction , 1992, SIGIR '92.

[14]  Wendy T. Lucas,et al.  Training for Web search: Will it get you in shape? , 2004, J. Assoc. Inf. Sci. Technol..

[15]  Zhiyong Lu,et al.  Evaluation of query expansion using MeSH in PubMed , 2009, Information Retrieval.

[16]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[17]  Michael Schroeder,et al.  GoPubMed: ontology-based literature search applied to Gene Ontology and PubMed , 2004, German Conference on Bioinformatics.

[18]  Sylvie Ranwez,et al.  Ontological Distance Measures for Information Visualisation on Conceptual Maps , 2006, OTM Workshops.

[19]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[20]  M. Grabisch,et al.  Preference Representation by the Choquet Integral : The Commensurability Hypothesis , 2004 .

[21]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[22]  Min Song,et al.  Integration of association rules and ontologies for semantic query expansion , 2007, Data Knowl. Eng..

[23]  Graeme Hirst,et al.  Lexical chains as representations of context for the detection and correction of malapropisms , 1995 .

[24]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[25]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[26]  B. Condie,et al.  Textpresso site‐specific recombinases: A text‐mining server for the recombinase literature including Cre mice and conditional alleles , 2009, Genesis.

[27]  Dietrich Rebholz-Schuhmann,et al.  Ontology refinement for improved information retrieval , 2010, Inf. Process. Manag..

[28]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[29]  Mark A. Musen,et al.  Comparison of Ontology-based Semantic-Similarity Measures , 2008, AMIA.

[30]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[31]  A. Tversky,et al.  Foundations of Measurement, Vol. I: Additive and Polynomial Representations , 1991 .

[32]  Rachael P. Huntley,et al.  The GOA database in 2009—an integrated Gene Ontology Annotation resource , 2008, Nucleic Acids Res..

[33]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[34]  Marcin Detyniecki,et al.  Browsing a Video with Simple Constrained Queries over Fuzzy Annotations , 2000, FQAS.

[35]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[36]  Hans-Michael Müller,et al.  Textpresso for Neuroscience: Searching the Full Text of Thousands of Neuroscience Research Papers , 2008, Neuroinformatics.

[37]  Sylvie Ranwez,et al.  User Centered and Ontology Based Information*Retrieval System for Life Sciences , 2010 .

[38]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[39]  Michael Schroeder,et al.  GoGene: gene annotation in the fast lane , 2009, Nucleic Acids Res..

[40]  S SolimanHamdy,et al.  Improving query precision using semantic expansion , 2007 .

[41]  Ahmed Abdelali,et al.  Improving query precision using semantic expansion , 2007, Inf. Process. Manag..

[42]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[43]  Patrick Suppes,et al.  Additive and Polynomial Representations , 2014 .

[44]  Michael K. Buckland,et al.  Annual Review of Information Science and Technology , 2006, J. Documentation.

[45]  Bernard J. Jansen,et al.  The effect of query complexity on Web searching results , 2000, Inf. Res..

[46]  Hele-Mai Haav,et al.  A Survey of Concept-based Information Retrieval Tools on the Web , 2001 .

[47]  Jaap Van Brakel,et al.  Foundations of measurement , 1983 .

[48]  David P. Davis,et al.  Discovering cancer genes by integrating network and functional properties , 2009, BMC Medical Genomics.

[49]  P Bork,et al.  XplorMed: a tool for exploring MEDLINE abstracts. , 2001, Trends in biochemical sciences.

[50]  Haïfa Zargayouna,et al.  Mesure de similarité dans une ontologie pour l'indexation sémantique de documents XML , 2004 .

[51]  Marta E Alarcón-Riquelme,et al.  Genome-wide association scan in women with systemic lupus erythematosus identifies susceptibility variants in ITGAM, PXK, KIAA1542 and other loci , 2008, Nature Genetics.

[52]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[53]  Hans-Michael Müller,et al.  Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.

[54]  Troels Andreasen An approach to knowledge-based query evaluation , 2003, Fuzzy Sets Syst..

[55]  Ted Pedersen,et al.  Towards a framework for developing semantic relatedness reference standards , 2011, J. Biomed. Informatics.

[56]  Pablo Castells,et al.  An Ontology-Based Information Retrieval Model , 2005, ESWC.

[57]  David Bawden,et al.  The dark side of information: overload, anxiety and other paradoxes and pathologies , 2009, J. Inf. Sci..

[58]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .