Semantic ranking and result visualization for life sciences publications

An ever-increasing amount of data and semantic knowledge in the domain of life sciences is bringing about new data management challenges. In this paper we focus on adding the semantic dimension to literature search, a central task in scientific research. We focus our attention on PubMed, the most significant bibliographic source in life sciences, and explore ways to use high-quality semantic annotations from the MeSH vocabulary to rank search results. We start by developing several families of ranking functions that relate a search query to a document's annotations. We then propose an efficient adaptive ranking mechanism for each of the families. We also describe a two-dimensional Skyline-based visualization that can be used in conjunction with the ranking to further improve the user's interaction with the system, and demonstrate how such Skylines can be computed adaptively and efficiently. Finally, we evaluate the effectiveness of our ranking with a user study.

[1]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[2]  Calton Pu,et al.  XWRAP: an XML-enabled wrapper construction system for Web information sources , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[3]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[4]  Jérôme David,et al.  Comparison between Ontology Distances (Preliminary Results) , 2008, SEMWEB.

[5]  Jennifer Widom,et al.  Exploiting hierarchical domain structure to compute similarity , 2003, TOIS.

[6]  Witold Abramowicz,et al.  MyPortal: robust extraction and aggregation of web content , 2006, VLDB.

[7]  Jiawei Han,et al.  Mining Thick Skylines over Large Databases , 2004, PKDD.

[8]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[9]  References , 1971 .

[10]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[11]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[12]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[13]  Divesh Srivastava,et al.  Fast Indexes and Algorithms for Set Similarity Selection Queries , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[14]  Dietrich Rebholz-Schuhmann,et al.  Categorization of services for seeking information in biomedical literature: a typology for improvement of practice , 2008, Briefings Bioinform..

[15]  Jason I. Hong,et al.  Marmite: Towards End-User Programming for the Web , 2007, IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC 2007).

[16]  Wolfgang Gatterbauer,et al.  Towards domain-independent information extraction from web tables , 2007, WWW '07.

[17]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[18]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[19]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[20]  Myoung-Ho Kim,et al.  Information Retrieval Based on Conceptual Distance in is-a Hierarchies , 1993, J. Documentation.

[21]  Georg Lausen,et al.  ViPER: augmenting automatic information extraction with visual perceptions , 2005, CIKM '05.

[22]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[23]  Georg Gottlob,et al.  Visual Web Information Extraction with Lixto , 2001, VLDB.

[24]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[25]  Roy Rada,et al.  Ranking documents with a thesaurus , 1989, JASIS.