Contralateral Breast Cancer Event Detection Using Nature Language Processing

To facilitate the identification of contralateral breast cancer events for large cohort study, we proposed and implemented a new method based on features extracted from narrative text in progress notes and features from numbers of pathology reports for each side of breast cancer. Our method collects medical concepts and their combinations to detect contralateral events in progress notes. In addition, the numbers of pathology reports generated for either left or right side of breast cancer were derived as additional features. We experimented with support vector machine using the derived features to detect contralateral events. In the cross-validation and held-out tests, the area under curve score is 0.93 and 0.89 respectively. This method can be replicated due to the simplicity of feature generation.

[1]  Fei Wang,et al.  Tensor factorization toward precision medicine , 2016, Briefings Bioinform..

[2]  Peter Szolovits,et al.  Bridging semantics and syntax with graph algorithms - state-of-the-art of extracting biomedical relations , 2017, Briefings Bioinform..

[3]  Peter Szolovits,et al.  Subgraph augmented non-negative tensor factorization (SANTF) for modeling clinical narrative text , 2015, J. Am. Medical Informatics Assoc..

[4]  Justin B Starren,et al.  Enabling a Learning Health System through a Unified Enterprise Data Warehouse: The Experience of the Northwestern University Clinical and Translational Sciences (NUCATS) Institute , 2015, Clinical and translational science.

[5]  Marilyn L Kwan,et al.  A Hybrid Approach to Identify Subsequent Breast Cancer Using Pathology and Automated Health Information Data , 2015, Medical care.

[6]  Hongyuan Gao,et al.  Using natural language processing to extract mammographic findings , 2015, J. Biomed. Informatics.

[7]  Neetu Chawla,et al.  Limited validity of diagnosis codes in Medicare claims for identifying cancer metastases and inferring stage. , 2014, Annals of epidemiology.

[8]  Peter Szolovits,et al.  Automatic lymphoma classification with sentence subgraph mining from pathology reports. , 2014, Journal of the American Medical Informatics Association : JAMIA.

[9]  Scott R. Halgrim,et al.  Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence. , 2014, American journal of epidemiology.

[10]  Peter Szolovits,et al.  Text Mining in Cancer Gene and Pathway Prioritization , 2014, Cancer informatics.

[11]  Justin A. Strauss,et al.  Identifying primary and recurrent cancers using a SAS-based natural language processing algorithm , 2012, J. Am. Medical Informatics Assoc..

[12]  Jessica Chubak,et al.  Administrative data algorithms to identify second breast cancer events following early-stage invasive breast cancer. , 2012, Journal of the National Cancer Institute.

[13]  R. McGaha,et al.  Breast cancer recurrence in older women five to ten years after diagnosis , 2010 .

[14]  T. Lash,et al.  Breast cancer treatment of older women in integrated health care settings. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[15]  Nicholas A. Christakis,et al.  Measuring Disease-Free Survival and Cancer Relapse Using Medicare Claims From CALGB Breast Cancer Trial Participants (Companion to 9344) , 2006, Journal of the National Cancer Institute.

[16]  N. Risch,et al.  The genetic epidemiology of second primary breast cancer. , 1992, American journal of epidemiology.

[17]  W. Thompson,et al.  Risk of contralateral breast cancer: associations with factors related to initial breast cancer. , 1988, American journal of epidemiology.

[18]  Thompson Wd Methodologic perspectives on the study of multiple primary cancers , 1986 .

[19]  B. Hankey,et al.  A retrospective cohort analysis of second breast cancer risk for primary breast cancer patients with an assessment of the effect of radiation therapy. , 1983, Journal of the National Cancer Institute.

[20]  P. Prior,et al.  Incidence of bilateral tumours in a population-based series of breast-cancer patients. I. Two approaches to an epidemiological analysis. , 1978, British Journal of Cancer.