PheKnow–Cloud: A Tool for Evaluating High-Throughput Phenotype Candidates using Online Medical Literature

As the adoption of Electronic Healthcare Records has grown, the need to transform manual processes that extract and characterize medical data into automatic and high-throughput processes has also grown. Recently, researchers have tackled the problem of automatically extracting candidate phenotypes from EHR data. Since these phenotypes are usually generated using unsupervised or semi-supervised methods, it is necessary to examine and validate the clinical relevance of the generated “candidate” phenotypes. We present PheKnow–Cloud, a framework that uses co-occurrence analysis on the publicly available, online repository ofjournal articles, PubMed, to build sets of evidence for user-supplied candidate phenotypes. PheKnow–Cloud works in an interactive manner to present the results of the candidate phenotype analysis. This tool seeks to help researchers and clinical professionals evaluate the automatically generated phenotypes so they may tune their processes and understand the candidate phenotypes.

[1]  Jerome Wang,et al.  An Applied Evaluation of SNOMED CT as a Clinical Vocabulary for the Computerized Diagnosis and Problem List , 2003, AMIA.

[2]  Jay Wook Lee,et al.  Fluid and Electrolyte Disturbances in Critically Ill Patients , 2010, Electrolyte & blood pressure : E & BP.

[3]  P. Easterbrook,et al.  Publication bias in clinical research , 1991, The Lancet.

[4]  Hua Xu,et al.  Applying active learning to high-throughput phenotyping algorithms for electronic health records data. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[5]  George Hripcsak,et al.  Next-generation phenotyping of electronic health records , 2012, J. Am. Medical Informatics Assoc..

[6]  Johannes M Freudenberg,et al.  Mining emerging biomedical literature for understanding disease associations in drug discovery. , 2014, Methods in molecular biology.

[7]  Joydeep Ghosh,et al.  Automated Verification of Phenotypes using PubMed , 2016, BCB.

[8]  Jimeng Sun,et al.  Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization , 2014, KDD.

[9]  J. Denny,et al.  Naïve Electronic Health Record phenotype identification for Rheumatoid arthritis. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[10]  Nigel Collier,et al.  Generation of Silver Standard Concept Annotations from Biomedical Texts with Special Relevance to Phenotypes , 2015, PloS one.

[11]  Jimeng Sun,et al.  Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics , 2015, KDD.

[12]  Jimeng Sun,et al.  Limestone: High-throughput candidate phenotype generation via tensor factorization , 2014, J. Biomed. Informatics.

[13]  R. Simes,et al.  Publication bias: evidence of delayed publication in a cohort study of clinical research projects , 1997, BMJ.

[14]  Peter Szolovits,et al.  Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources , 2015, J. Am. Medical Informatics Assoc..

[15]  Janos X. Binder,et al.  DISEASES: Text mining and data integration of disease–gene associations , 2014, bioRxiv.

[16]  George Hripcsak,et al.  Birth month affects lifetime disease risk: a phenome-wide method , 2015, J. Am. Medical Informatics Assoc..

[17]  Matthew Harding,et al.  Scalable Bayesian Non-negative Tensor Factorization for Massive Count Data , 2015, ECML/PKDD.

[18]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[19]  K. Dickersin The existence of publication bias and risk factors for its occurrence. , 1990, JAMA.

[20]  Zhiyong Lu,et al.  Semi-automatic semantic annotation of PubMed queries: A study on quality, efficiency, satisfaction , 2011, J. Biomed. Informatics.