Lexical Acquisition for Clinical Text Mining Using Distributional Similarity

We describe experiments into the use of distributional similarity for acquiring lexical information from clinical free text, in particular notes typed by primary care physicians (general practitioners). We also present a novel approach to lexical acquisition from ‘sensitive' text, which does not require the text to be manually anonymised --- a very expensive process --- and therefore allows much larger datasets to be used than would normally be possible.

[1]  Yuan Luo,et al.  Identifying patient smoking status from medical discharge records. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[2]  Rob Koeling,et al.  Automatically estimating the incidence of symptoms recorded in GP free text notes , 2011, MIXHS '11.

[3]  Angus Roberts,et al.  Mining clinical relationships from patient narratives , 2008, BMC Bioinformatics.

[4]  David J. Weir,et al.  Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity , 2005, CL.

[5]  Johanna D. Moore,et al.  36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, COLING-ACL '98, August 10-14, 1998, Université de Montréal, Montréal, Quebec, Canada. Proceedings of the Conference. , 1998 .

[6]  Edmond Chow,et al.  New Experiments in Distributional Representations of Synonymy , 2005, CoNLL.

[7]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[8]  Rosemary Tate,et al.  Annotating a corpus of clinical text records for learning to recognize symptoms automatically , 2011 .

[9]  Jörg Tiedemann,et al.  Finding Medical Term Variations using Parallel Corpora and Distributional Similarity , 2010 .

[10]  A Rosemary Tate,et al.  Using free text information to explore how and when GPs code a diagnosis of ovarian cancer: an observational study using primary care records of patients with ovarian cancer , 2011, BMJ Open.

[11]  Krzysztof Zielinski,et al.  Information Technology Solutions for Healthcare , 2005 .

[12]  Diana McCarthy,et al.  Domain-Speci(cid:12)c Sense Distributions and Predominant Sense Acquisition , 2022 .

[13]  Julie Weeds,et al.  Unsupervised Acquisition of Predominant Word Senses , 2007, CL.

[14]  Gerold Schneider,et al.  Using distributional similarity to organise biomedical terminology , 2005 .

[15]  T. Peters,et al.  Risk of ovarian cancer in women with symptoms in primary care: population based case-control study , 2009, BMJ : British Medical Journal.

[16]  Tim Beißbarth,et al.  Extending pathways based on gene lists using InterPro domain signatures , 2008, BMC Bioinformatics.

[17]  Carol Friedman,et al.  Semantic classification of biomedical concepts using distributional similarity. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[18]  Philip Resnik,et al.  Communication of Clinically Relevant Information in Electronic Health Records : A Comparison between Structured Data and Unrestricted Physician Language , 2008 .

[19]  Maria Kvist,et al.  Diagnosis Code Assignment Support Using Random Indexing of Patient Records - A Qualitative Feasibility Study , 2011, AIME.

[20]  Olaf R. P. Bininda-Emonds,et al.  Garbage in, Garbage out , 2004 .

[21]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[22]  K. Bretonnel Cohen,et al.  A shared task involving multi-label classification of clinical free text , 2007, BioNLP@ACL.

[23]  Özlem Uzuner,et al.  Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[24]  Dipak Kalra,et al.  Electronic Health Records , 2006 .

[25]  James R. Curran,et al.  Scaling Context Space , 2002, ACL.

[26]  Monika Alise Johansen,et al.  "Garbage in, garbage out": extracting disease surveillance data from epr systems in primary care , 2008, CSCW.