A method of searching for related literature on protein structure analysis by considering a user's intention

BackgroundIn recent years, with advances in techniques for protein structure analysis, the knowledge about protein structure and function has been published in a vast number of articles. A method to search for specific publications from such a large pool of articles is needed. In this paper, we propose a method to search for related articles on protein structure analysis by using an article itself as a query.ResultsEach article is represented as a set of concepts in the proposed method. Then, by using similarities among concepts formulated from databases such as Gene Ontology, similarities between articles are evaluated. In this framework, the desired search results vary depending on the user's search intention because a variety of information is included in a single article. Therefore, the proposed method provides not only one input article (primary article) but also additional articles related to it as an input query to determine the search intention of the user, based on the relationship between two query articles. In other words, based on the concepts contained in the input article and additional articles, we actualize a relevant literature search that considers user intention by varying the degree of attention given to each concept and modifying the concept hierarchy graph.ConclusionsWe performed an experiment to retrieve relevant papers from articles on protein structure analysis registered in the Protein Data Bank by using three query datasets. The experimental results yielded search results with better accuracy than when user intention was not considered, confirming the effectiveness of the proposed method.

[1]  C. Lee Giles,et al.  CiteSeer: an automatic citation indexing system , 1998, DL '98.

[2]  Susan Gauch,et al.  Document similarity based on concept tree distance , 2008, Hypertext.

[3]  Prudence Mutowo-Meullenet,et al.  Manual GO annotation of predictive protein signatures: the InterPro approach to GO curation , 2012, Database J. Biol. Databases Curation.

[4]  Takenao Ohkawa,et al.  Method of Retrieving Articles on Protein Structure Analysis from User Intention , 2013 .

[5]  Fei Shu Ontology-based Indexing Technologies in Information Retrieval : Building a Topic Map ( ISO 13250 ) for a Mathematics Education Database , 2014 .

[6]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[7]  T K Schleyer,et al.  Retrieval and Classification of Dental Research Articles , 2003, Advances in dental research.

[8]  Purvesh Khatri,et al.  Predicting Novel Human Gene Ontology Annotations Using Semantic Analysis , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Robert D. Finn,et al.  InterPro in 2011: new developments in the family and domain prediction database , 2011, Nucleic acids research.

[10]  John Goodier,et al.  A Dictionary of Statistics , 2003 .

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  Jo McEntyre,et al.  The NCBI Handbook , 2002 .

[13]  Ian H. Witten,et al.  Learning a concept-based document similarity measure , 2012, J. Assoc. Inf. Sci. Technol..

[14]  Stephen E. Robertson,et al.  Using Terms from Citations for IR: Some First Results , 2008, ECIR.

[15]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[16]  Manabu Okumura,et al.  Towards Multi-paper Summarization Using Reference Information , 1999, IJCAI.

[17]  Kathi Canese,et al.  PubMed: The Bibliographic Database , 2013 .

[18]  Maria Jesus Martin,et al.  SIFTS: Structure Integration with Function, Taxonomy and Sequences resource , 2012, Nucleic Acids Res..

[19]  Xiaojun Wan,et al.  Document Similarity Search Based on Manifold-Ranking of TextTiles , 2006, AIRS.

[20]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[21]  Alireza Noruzi Google Scholar: The New Generation of Citation Indexes , 2005 .

[22]  Olivier Poch,et al.  KD4v: comprehensible knowledge discovery system for missense variant , 2012, Nucleic Acids Res..