Evaluating measures of redundancy in clinical texts.

Although information redundancy has been reported as an important problem for clinicians when using electronic health records and clinical reports, measuring redundancy in clinical text has not been extensively investigated. We evaluated several automated techniques to quantify the redundancy in clinical documents using an expert-derived reference standard consisting of outpatient clinical documents. The technique that resulted in the best correlation (82%) with human ratings consisted a modified dynamic programming alignment algorithm over a sliding window augmented with a) lexical normalization and b) stopword removal. When this method was applied to the overall outpatient record, we found that overall information redundancy in clinical notes increased over time and that mean document redundancy scores for individual patient documents appear to have cyclical patterns corresponding to clinical events. These results show that outpatient documents have large amounts of redundant information and that development of effective redundancy measures warrants additional investigation.

[1]  George Hripcsak,et al.  Use abstracted patient-specific features to assist an information-theoretic measurement to assess similarity between medical cases , 2008, J. Biomed. Informatics.

[2]  Jean D. Gibbons,et al.  Nonparametric Methods for Quantitative Analysis (3rd edition) , 1996 .

[3]  Ted Pedersen,et al.  UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity , 2009, AMIA.

[4]  Mounir Errami,et al.  eTBLAST: a web server to identify expert reviewers, appropriate journals and similar publications , 2007, Nucleic Acids Res..

[5]  R. Kaushal,et al.  Physicians’ Attitudes Towards Copy and Pasting in Electronic Note Writing , 2008, Journal of General Internal Medicine.

[6]  Jimmy J. Lin,et al.  PubMed related articles: a probabilistic topic-based model for content similarity , 2007, BMC Bioinformatics.

[7]  Constance M. Johnson,et al.  Safety Issues Related to the Electronic Medical Record (EMR): Synthesis of the Literature from the Last Decade, 2000‐2009 , 2011, Journal of healthcare management / American College of Healthcare Executives.

[8]  E Ammenwerth,et al.  The Time Needed for Clinical Documentation versus Direct Patient Care , 2009, Methods of Information in Medicine.

[9]  J. Gibbons,et al.  Nonparametric Methods for Quantitative Analysis (3rd. ed.). , 1997 .

[10]  George Hripcsak,et al.  Inter-patient distance metrics using SNOMED CT defining relationships , 2006, J. Biomed. Informatics.

[11]  Ross Koppel,et al.  Healthcare IT usability and suitability for clinical needs: challenges of design, workflow, and contractual relations. , 2010, Studies in health technology and informatics.

[12]  Terrence Adam,et al.  Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[13]  L. Cronbach Coefficient alpha and the internal structure of tests , 1951 .

[14]  David J. Groggel,et al.  Nonparametric Methods for Quantitative Analysis , 1996, Technometrics.

[15]  Christopher G. Chute,et al.  A Data-Driven Approach for Extracting "the Most Specific Term" for Ontology Development , 2003, AMIA.

[16]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[17]  Herbert S. Lin,et al.  Computational Technology for Effective Health Care: Immediate Steps and Strategic Directions , 2009 .

[18]  Allen C. Browne,et al.  Lexical methods for managing variation in biomedical terminologies. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.