More Agility to Semantic Similarities Algorithm Implementations

Algorithms for measuring semantic similarity between Gene Ontology (GO) terms has become a popular area of research in bioinformatics as it can help to detect functional associations between genes and potential impact to the health and well-being of humans, animals, and plants. While the focus of the research is on the design and improvement of GO semantic similarity algorithms, there is still a need for implementation of such algorithms before they can be used to solve actual biological problems. This can be challenging given that the potential users usually come from a biology background and they are not programmers. A number of implementations exist for some well-established algorithms but these implementations are not generic enough to support any algorithm other than the ones they are designed for. The aim of this paper is to shift the focus away from implementation, allowing researchers to focus on algorithm’s design and execution rather than implementation. This is achieved by an implementation approach capable of understanding and executing user defined GO semantic similarity algorithms. Questions and answers were used for the definition of the user defined algorithm. Additionally, this approach understands any direct acyclic digraph in an Open Biomedical Ontologies (OBO)-like format and its annotations. On the other hand, software developers of similar applications can also benefit by using this as a template for their applications.

[1]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[2]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[3]  Mario Albrecht,et al.  FunSimMat: a comprehensive functional similarity database , 2007, Nucleic Acids Res..

[4]  Francisco M. Couto,et al.  Text Mining for Bioinformatics Using Biomedical Literature , 2019, Encyclopedia of Bioinformatics and Computational Biology.

[5]  Guangchuang Yu,et al.  Using meshes for MeSH term enrichment and semantic analyses , 2018, Bioinform..

[6]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[7]  Meng Liu,et al.  GO functional similarity clustering depends on similarity measure, clustering method, and annotation completeness , 2019, BMC Bioinformatics.

[8]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[9]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[10]  Srinivas Nidhra,et al.  BLACK BOX AND WHITE BOX TESTING TECHNIQUES -A LITERATURE REVIEW , 2012 .

[11]  Catia Pesquita,et al.  ProteInOn: A Web Tool for Protein Semantic Similarity , 2007 .

[12]  Hend Suliman Al-Khalifa,et al.  Handling Big Data Scalability in Biological Domain Using Parallel and Distributed Processing: A Case of Three Biological Semantic Similarity Measures , 2019, BioMed research international.

[13]  Gene Ontology Consortium,et al.  The Gene Ontology (GO) project in 2006 , 2005, Nucleic Acids Res..

[14]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[15]  Zheng Wang,et al.  GOGO: An improved algorithm to measure the semantic similarity between gene ontology terms , 2018, Scientific Reports.

[16]  Ehsaneddin Asgari,et al.  Deep Genomics and Proteomics: Language Model-Based Embedding of Biological Sequences and Their Applications in Bioinformatics , 2019 .

[17]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[18]  Yang Yang,et al.  Missing value imputation for microRNA expression data by using a GO-based similarity measure , 2016, BMC Bioinformatics.