The Generation of a Lung Cancer Health Factor Distribution Using Patient Graphs Constructed From Electronic Medical Records: Retrospective Study

Background Electronic medical records (EMRs) of patients with lung cancer (LC) capture a variety of health factors. Understanding the distribution of these factors will help identify key factors for risk prediction in preventive screening for LC. Objective We aimed to generate an integrated biomedical graph from EMR data and Unified Medical Language System (UMLS) ontology for LC, and to generate an LC health factor distribution from a hospital EMR of approximately 1 million patients. Methods The data were collected from 2 sets of 1397 patients with and those without LC. A patient-centered health factor graph was plotted with 108,000 standardized data, and a graph database was generated to integrate the graphs of patient health factors and the UMLS ontology. With the patient graph, we calculated the connection delta ratio (CDR) for each of the health factors to measure the relative strength of the factor’s relationship to LC. Results The patient graph had 93,000 relations between the 2794 patient nodes and 650 factor nodes. An LC graph with 187 related biomedical concepts and 188 horizontal biomedical relations was plotted and linked to the patient graph. Searching the integrated biomedical graph with any number or category of health factors resulted in graphical representations of relationships between patients and factors, while searches using any patient presented the patient’s health factors from the EMR and the LC knowledge graph (KG) from the UMLS in the same graph. Sorting the health factors by CDR in descending order generated a distribution of health factors for LC. The top 70 CDR-ranked factors of disease, symptom, medical history, observation, and laboratory test categories were verified to be concordant with those found in the literature. Conclusions By collecting standardized data of thousands of patients with and those without LC from the EMR, it was possible to generate a hospital-wide patient-centered health factor graph for graph search and presentation. The patient graph could be integrated with the UMLS KG for LC and thus enable hospitals to bring continuously updated international standard biomedical KGs from the UMLS for clinical use in hospitals. CDR analysis of the graph of patients with LC generated a CDR-sorted distribution of health factors, in which the top CDR-ranked health factors were concordant with the literature. The resulting distribution of LC health factors can be used to help personalize risk evaluation and preventive screening recommendations.

[1]  Bairong Shen,et al.  Prostate cancer management with lifestyle intervention: From knowledge graph to Chatbot , 2022, Clinical and Translational Discovery.

[2]  D. Tuck A cancer graph: a lung cancer property graph database in Neo4j , 2022, BMC Research Notes.

[3]  Maximilian T. Strauss,et al.  A knowledge graph to interpret clinical proteomics data , 2022, Nature Biotechnology.

[4]  J. Schmid,et al.  Alterations of the Platelet Proteome in Lung Cancer: Accelerated F13A1 and ER Processing as New Actors in Hypercoagulability , 2021, Cancers.

[5]  Zhibin Hu,et al.  Association between dietary sodium, potassium intake and lung cancer risk: evidence from the prostate, lung, colorectal and ovarian cancer screening trial and the Women’s Health Initiative , 2021, Translational lung cancer research.

[6]  Matthias Ganzinger,et al.  Graph-Representation of Patient Data: a Systematic Literature Review , 2020, Journal of Medical Systems.

[7]  Zhi-De Hu,et al.  Diagnostic accuracy of human epididymis secretory protein 4 for lung cancer: a systematic review and meta-analysis. , 2019, Journal of thoracic disease.

[8]  J. F. Gonçalves,et al.  Hypocalcemia in cancer patients: An exploratory study , 2019, Porto biomedical journal.

[9]  Hongbing Shen,et al.  Associations Between Hepatitis B Virus Infection and Risk of All Cancer Types , 2019, JAMA Network Open.

[10]  Wei Song,et al.  Challenges and research opportunities for lung cancer screening in China , 2018, Cancer communications.

[11]  Vasa Curcin,et al.  Possible Sources of Bias in Primary Care Electronic Health Record Data Use and Reuse , 2018, Journal of medical Internet research.

[12]  P. Pinsky,et al.  Lung cancer screening with low-dose CT: a world-wide view. , 2018, Translational lung cancer research.

[13]  David Sontag,et al.  Learning a Health Knowledge Graph from Electronic Medical Records , 2017, Scientific Reports.

[14]  Yong Zhang,et al.  Platelet distribution width correlates with prognosis of non-small cell lung cancer , 2017, Scientific Reports.

[15]  Francesco Petrella,et al.  Non-small-cell lung cancer , 2015, Nature Reviews Disease Primers.

[16]  S. Bojesen,et al.  IgE and risk of cancer in 37 747 individuals from the general population. , 2015, Annals of oncology : official journal of the European Society for Medical Oncology.

[17]  Arantxa Otegi,et al.  Improving search over Electronic Health Records using UMLS-based query expansion through random walks , 2014, J. Biomed. Informatics.

[18]  A. Yılmaz,et al.  Prognostic significance of hemostatic parameters in patients with lung cancer. , 2004, Respiratory medicine.

[19]  Finn Gyntelberg,et al.  Association between atherosclerosis and female lung cancer--a Danish cohort study. , 2003, Lung cancer.

[20]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..