A method for supporting retrieval of articles on protein structure analysis considering users’ intention

BackgroundIn recent years, information about protein structure and function is described in a large amount of articles. However, a naive full-text search by specific keywords often fails to find desired articles, because the articles involve the ambiguous and complicated concepts that cannot be described with uniform representation. For retrieving articles on protein structure and function, it is important to consider the relevance between structural and/or functional concepts by identifying the user’s intention.ResultsWe introduce a scheme of evaluating relevance between articles based on various biological databases and ontologies on structures and functions of proteins. The relevance, which is defined as a path length between concepts on hierarchies, is modified adaptively based on additional articles as a query in order to reflect the user’s intention. Also we implemented the retrieval system, in which the user can input some articles as a query and the related articles are retrieved and displayed on the 2D map.ConclusionsThe effectiveness of the proposed system was confirmed experimentally by having shown that the users can obtain easily highly related articles which reflect their intention.

[1]  Carol Friedman,et al.  Information theory applied to the sparse gene ontology annotation network to predict novel gene function , 2007, ISMB/ECCB.

[2]  Mark Gerstein,et al.  Total ancestry measure: quantifying the similarity in tree-like classification, with genomic applications , 2007, Bioinform..

[3]  王林,et al.  GoPubmed , 2010 .

[4]  Ian H. Witten,et al.  Proceedings of the third ACM conference on Digital libraries , 1998 .

[5]  C. Lee Giles,et al.  CiteSeer: an automatic citation indexing system , 1998, DL '98.

[6]  Yum Lina Yip,et al.  SSMap: A new UniProt-PDB mapping resource for the curation of structural-related information in the UniProt/Swiss-Prot Knowledgebase , 2008, BMC Bioinformatics.

[7]  Sameer Velankar,et al.  E-MSD: an integrated data resource for bioinformatics , 2004, Nucleic Acids Res..

[8]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[9]  Tomonobu Ozaki,et al.  A Method to Identify Protein Names with Iterative Extension of Training Data Set , 2010, BICoB.

[10]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[11]  Haiyuan Yu,et al.  Developing a similarity measure in biological function space , 2007 .

[12]  Tomonobu Ozaki,et al.  Selection of Effective Sentences from a Corpus to Improve the Accuracy of Identification of Protein Names , 2009 .

[13]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..