COMPARISON OF CONCEPT RECOGNIZERS FOR BUILDING THE BIOMEDICAL ANNOTATOR

Biological information will be extracted from these large and for the most part unknown knowledge, resulting in data-driven genomic, transcriptomic and epigenomic discoveries. Yet, search of relevant datasets for information discovery is limitedly supported: data describing write in code datasets square measure quite straight forward and incomplete, and not delineated by a coherant underlying metaphysics. Here, we have a tendency to show a way to overcome this limitation, by adopting associate degree write in code data looking approach that uses high-quality metaphysics information and progressive categorization technologies. Specifically, we have a tendency to developed S.O.S. GeM (http://www.bioinformatics.deib.polimi.it/SOSGeM/), a system supporting effective linguistics search and retrieval of write in code datasets. First, we have a tendency to made a linguistics mental object by beginning with ideas extracted from write in code data, matched to and enlarge on medical specialty ontologies integrated within the we have a tendency toll-established Unified Medical Language System; we prove that this reasoning technique is sound and complete. Then, we have a tendency to leveraged the linguistics mental object to semantically search write in code knowledge from arbitrary biologists’ queries; this permits properly finding additional datasets than those extracted by a strictly syntactical search, as supported by the opposite out there systems. We have a tendency to by trial and error show the relevancy of found datasets to the biologists’ queries.