CSI-OMIM - Clinical Synopsis Search in OMIM

BackgroundThe OMIM database is a tool used daily by geneticists. Syndrome pages include a Clinical Synopsis section containing a list of known phenotypes comprising a clinical syndrome. The phenotypes are in free text and different phrases are often used to describe the same phenotype, the differences originating in spelling variations or typing errors, varying sentence structures and terminological variants.These variations hinder searching for syndromes or using the large amount of phenotypic information for research purposes. In addition, negation forms also create false positives when searching the textual description of phenotypes and induce noise in text mining applications.DescriptionOur method allows efficient and complete search of OMIM phenotypes as well as improved data-mining of the OMIM phenome. Applying natural language processing, each phrase is tagged with additional semantic information using UMLS and MESH. Using a grammar based method, annotated phrases are clustered into groups denoting similar phenotypes. These groups of synonymous expressions enable precise search, as query terms can be matched with the many variations that appear in OMIM, while avoiding over-matching expressions that include the query term in a negative context. On the basis of these clusters, we computed pair-wise similarity among syndromes in OMIM. Using this new similarity measure, we identified 79,770 new connections between syndromes, an average of 16 new connections per syndrome. Our project is Web-based and available at http://fohs.bgu.ac.il/s2g/csiomimConclusionsThe resulting enhanced search functionality provides clinicians with an efficient tool for diagnosis. This search application is also used for finding similar syndromes for the candidate gene prioritization tool S2G.The enhanced OMIM database we produced can be further used for bioinformatics purposes such as linking phenotypes and genes based on syndrome similarities and the known genes in Morbidmap.

[1]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[2]  C E Lipscomb,et al.  Medical Subject Headings (MeSH). , 2000, Bulletin of the Medical Library Association.

[3]  V. McKusick Mendelian Inheritance in Man and Its Online Version, OMIM , 2007, The American Journal of Human Genetics.

[4]  Wanda Pratt,et al.  A Study of Biomedical Concept Identification: MetaMap vs. People , 2003, AMIA.

[5]  P. Robinson,et al.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. , 2008, American journal of human genetics.

[6]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[7]  Carol Friedman,et al.  Semantic reclassification of the UMLS concepts , 2008, Bioinform..

[8]  Pall I. Olason,et al.  A human phenome-interactome network of protein complexes implicated in genetic disorders , 2007, Nature Biotechnology.

[9]  S. Mundlos,et al.  The Human Phenotype Ontology , 2010, Clinical genetics.

[10]  Peter J. Haug,et al.  Evaluation of Medical Problem Extraction from Electronic Clinical Documents Using MetaMap Transfer (MMTx) , 2005, MIE.

[11]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[12]  Eugene Charniak,et al.  Figures of Merit for Best-First Probabilistic Chart Parsing , 1998, Comput. Linguistics.

[13]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[14]  M. Cantor,et al.  Mining OMIM for insight into complex diseases. , 2004, Studies in health technology and informatics.

[15]  Wendy W. Chapman,et al.  Identifying Respiratory Findings in Emergency Department Reports for Biosurveillance using MetaMap , 2004, MedInfo.

[16]  G. Vriend,et al.  A text-mining analysis of the human phenome , 2006, European Journal of Human Genetics.

[17]  BMC Bioinformatics , 2005 .

[18]  Avitan Gefen,et al.  Syndrome to gene (S2G): in‐silico identification of candidate genes for human diseases , 2010, Human mutation.

[19]  Jan Freudenberg,et al.  A similarity-based method for genome-wide prediction of disease-relevant human genes , 2002, ECCB.

[20]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[21]  Francesco Pinciroli,et al.  GFINDer: genetic disease and phenotype location statistical analysis and mining of dynamically annotated gene lists , 2005, Nucleic Acids Res..

[22]  Yves A. Lussier,et al.  Mining OMIM$^{\trade}$ for Insight into Complex Diseases , 2004, MedInfo.