Coreference resolution on entities and events for hospital discharge summaries

The wealth of medical information contained in electronic medical records (EMRs) and Natural Language Processing (NLP) technologies that can automatically extract information from them have opened the doors to automatic patient-care quality monitoring and medical-assist question answering systems. This thesis studies coreference resolution, an information extraction (IE) subtask that links together specific mentions to each entity. Coreference resolution enables us to find changes in the state of entities and makes it possible to answer questions regarding the information thus obtained. We perform coreference resolution on a specific type of EMR, the hospital discharge summary. We treat coreference resolution as a binary classification problem. Our approach yields insights into the critical features for coreference resolution for entities that fall into five medical semantic categories that commonly appear in discharge summaries. Thesis Supervisor: Ozlem Uzuner Title: Assistant Professor, SUNY Thesis Supervisor: Peter Szolovits Title: Professor

[1]  Nancy Chinchor,et al.  Statistical Significance of MUC-6 Results , 1995, MUC.

[2]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[3]  Allen C. Browne,et al.  UMLS knowledge for biomedical language processing. , 1993, Bulletin of the Medical Library Association.

[4]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[5]  Claire Gardent,et al.  Improving Machine Learning Approaches to Coreference Resolution , 2002, ACL.

[6]  Lynette Hirschman,et al.  A Model-Theoretic Coreference Scoring Scheme , 1995, MUC.

[7]  Ted Pedersen,et al.  Name Discrimination by Clustering Similar Contexts , 2005, CICLing.

[8]  Wendy G. Lehnert,et al.  Using Decision Trees for Coreference Resolution , 1995, IJCAI.

[9]  P. Elango Coreference Resolution : A Survey , 2006 .

[10]  Tawanda C. Sibanda,et al.  Was the Patient Cured? Understanding Semantic Categories and Their Relationships in Patient Records , 2006 .

[11]  Atanas Kiryakov,et al.  Towards Semantic Web Information Extraction , 2003 .

[12]  Kalina Bontcheva,et al.  Developing Language Processing Components with GATE (a User Guide) , 2003 .

[13]  Douglas L. Mann,et al.  Update: shortness of breath. , 2003, Circulation.

[14]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[15]  Jian Su,et al.  An NP-Cluster Based Approach to Coreference Resolution , 2004, COLING.

[16]  Nancy A. Chinchor,et al.  Overview of MUC-7 , 1998, MUC.

[17]  David Fisher,et al.  Description of the UMass system as used for MUC-6 , 1995, MUC.

[18]  Betsy L. Humphreys,et al.  Technical Milestone: The Unified Medical Language System: An Informatics Research Collaboration , 1998, J. Am. Medical Informatics Assoc..

[19]  Jian Su,et al.  Improving Noun Phrase Coreference Resolution by Matching Strings , 2004, IJCNLP.

[20]  Mark Stevenson,et al.  Learning Information Extraction Patterns Using WordNet , 2006 .

[21]  Austin Tate,et al.  Proceedings of the Nineteenth National Conference on Artificial Intelligence, Sixteenth Conference on Innovative Applications of Artificial Intelligence, July 25-29, 2004, San Jose, California, USA , 2004, AAAI 2004.

[22]  D. Lindberg,et al.  Unified Medical Language System , 2020, Definitions.

[23]  Thomas S. Morton,et al.  Coreference for NLP Applications , 2000, ACL.

[24]  Barbara Di Eugenio,et al.  Squibs and Discussions: The Kappa Statistic: A Second Look , 2004, CL.

[25]  Breck Baldwin,et al.  Algorithms for Scoring Coreference Chains , 1998 .

[26]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[27]  Hwee Tou Ng,et al.  Named Entity Recognition with a Maximum Entropy Approach , 2003, CoNLL.

[28]  R. Mooney,et al.  Learning to Combine Trained Distance Metrics for Duplicate Detection in Databases , 2002 .

[29]  K. E. Ravikumar,et al.  A Biological Named Entity Recognizer , 2002, Pacific Symposium on Biocomputing.

[30]  Dan Roth,et al.  Identification and Tracing of Ambiguous Names: Discriminative and Generative Approaches , 2004, AAAI.

[31]  D. M. Fleming,et al.  The potential of electronic medical records for health service management. , 2006 .

[32]  Kees van Deemter,et al.  On Coreferring: Coreference in MUC and Related Annotation Schemes , 2000, CL.

[33]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[34]  Claire Cardie,et al.  Noun Phrase Coreference as Clustering , 1999, EMNLP.

[35]  William W. Cohen Data integration using similarity joins and a word-based information representation language , 2000, TOIS.

[36]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[37]  Béatrice Daille,et al.  Study and Implementation of Combined Techniques for Automatic Extraction of Terminology , 1994 .

[38]  Jun'ichi Tsujii,et al.  Event Extraction from Biomedical Papers Using a Full Parser , 2000, Pacific Symposium on Biocomputing.

[39]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[40]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[41]  Kalina Bontcheva,et al.  Shallow Methods for Named Entity Coreference Resolution , 2002 .

[42]  Aimo A. Törn,et al.  Global Optimization , 1999, Science.

[43]  Vincent Ng,et al.  Machine Learning for Coreference Resolution: From Local Classification to Global Ranking , 2005, ACL.

[44]  Michael Strube,et al.  The Influence of Minimum Edit Distance on Reference Resolution , 2002, EMNLP.

[45]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[46]  Claire Cardie,et al.  Partially Supervised Coreference Resolution for Opinion Summarization through Structured Rule Learning , 2006, EMNLP.

[47]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[48]  B. M. Sundheim,et al.  Named entity task definition, version 2.1 , 1995 .

[49]  Frank Schilder,et al.  From Temporal Expressions To Temporal Information: Semantic Tagging Of News Messages , 2001, The Language of Time - A Reader.

[50]  Jung Kwon Lee Primary Care Physician , 1999 .

[51]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[52]  William E. Winkler,et al.  The State of Record Linkage and Current Research Problems , 1999 .

[53]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[54]  Lynette Hirschman,et al.  MITRE: Description of the Alembic System Used for MUC-6 , 1995, MUC.

[55]  Vincent Ng,et al.  Learning Noun Phrase Anaphoricity to Improve Conference Resolution: Issues in Representation and Optimization , 2004, ACL.

[56]  Mirella Lapata,et al.  Discourse Chunking and its Application to Sentence Compression , 2005, HLT.

[57]  Xiaoqiang Luo,et al.  On Coreference Resolution Performance Metrics , 2005, HLT.

[58]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[59]  Tommi S. Jaakkola,et al.  Using term informativeness for named entity detection , 2005, SIGIR '05.

[60]  Shih-Hung Wu,et al.  Domain Event Extraction and Representation with Domain Ontology , 2003, IIWeb.