论文信息 - Natural language processing: an introduction

Natural language processing: an introduction

OBJECTIVES To provide an overview and tutorial of natural language processing (NLP) and modern NLP-system design. TARGET AUDIENCE This tutorial targets the medical informatics generalist who has limited acquaintance with the principles behind NLP and/or limited knowledge of the current state of the art. SCOPE We describe the historical evolution of NLP, and summarize common NLP sub-problems in this extensive field. We then provide a synopsis of selected highlights of medical NLP efforts. After providing a brief description of common machine-learning approaches that are being used for diverse NLP sub-problems, we discuss how modern NLP architectures are designed, with a summary of the Apache Foundation's Unstructured Information Management Architecture. We finally consider possible future directions for NLP, and reflect on the possible impact of IBM Watson on the medical field.

[1] John Hutchins,et al. The first public demonstration of machine translation : the Georgetown-IBM system , 7 th January 1954 , 2006 .

[2] Tughrul Arslan,et al. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2000) , 2000 .

[3] Peter J. Haug,et al. ONYX: A System for the Semantic Analysis of Clinical Text , 2009, BioNLP@HLT-NAACL.

[4] Craig A. Morioka,et al. IndexFinder: A Method of Extracting Key Concepts from Clinical Texts for Indexing , 2003, AMIA.

[5] Andrew McCallum,et al. Gene Prediction with Conditional Random Fields , 2005 .

[6] Cui Tao,et al. Time-Oriented Question Answering from Clinical Narratives Using Semantic-Web Techniques , 2010, SEMWEB.

[7] Peter J. Haug,et al. MPLUS: a probabilistic medical language understanding system , 2002, ACL Workshop on Natural Language Processing in the Biomedical Domain.

[8] D. Lindberg,et al. Unified Medical Language System , 2020, Definitions.

[9] David Scott Warren,et al. Programming in Tabled Prolog , 1995 .

[10] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[11] Wendy W. Chapman,et al. Fever detection from free-text clinical records for biosurveillance , 2004, Journal of Biomedical Informatics.

[12] Alan R. Aronson,et al. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[13] Jeffrey E. F. Friedl. Mastering Regular Expressions , 1997 .

[14] Hongfang Liu,et al. A study of abbreviations in MEDLINE abstracts , 2002, AMIA.

[15] Raymond J. Mooney,et al. Active Learning for Natural Language Parsing and Information Extraction , 1999, ICML.

[16] P. Haug,et al. Computerized extraction of coded findings from free-text radiologic reports. Work in progress. , 1990, Radiology.

[17] Yuan Luo,et al. Identifying patient smoking status from medical discharge records. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[18] Randolph A. Miller,et al. Review: Medical Diagnostic Decision Support Systems - Past, Present, And Future: A Threaded Bibliography and Brief Commentary , 1994, J. Am. Medical Informatics Assoc..

[19] Xiaoyan Wang,et al. Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[20] Andreas Christmann,et al. Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[21] Andrew McCallum,et al. An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[22] Özlem Uzuner,et al. Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[23] Prakash M. Nadkarni,et al. Research Paper: Use of General-purpose Negation Detection to Augment Concept Indexing of Medical Documents: A Quantitative Study Using the UMLS , 2001, J. Am. Medical Informatics Assoc..

[24] Timothy M. Franz,et al. Enhancement of clinicians' diagnostic reasoning by computer-based consultation: a multisite study of 2 systems. , 1999, JAMA.

[25] Paul Fodor,et al. Natural Language Processing With Prolog in the IBM Watson System , 2011 .

[26] Noam Chomsky,et al. On Certain Formal Properties of Grammars , 1959, Inf. Control..

[27] Wendy W. Chapman,et al. A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[28] Paul Fodor,et al. The Prolog Interface to the Unstructured Information Management Architecture , 2008, ArXiv.

[29] Wendy W. Chapman,et al. Identifying Respiratory Findings in Emergency Department Reports for Biosurveillance using MetaMap , 2004, MedInfo.

[30] Yang Huang,et al. A novel hybrid approach to automated negation detection in clinical radiology reports. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[31] Su Jian,et al. Exploring Deep Knowledge Resources in Biomedical Name Recognition , 2004, NLPBA/BioNLP.

[32] Jian Su,et al. Exploring Deep Knowledge Resources in Biomedical Name Recognition , 2004, NLPBA/BioNLP.

[33] Tony Mason,et al. Lex & Yacc , 1992 .

[34] Stuart M. Shieber,et al. Foundational issues in natural language processing , 1991 .

[35] Alaa A. Kharbouch,et al. Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[36] Allen C. Browne,et al. Evaluating lexical variant generation to improve information retrieval , 1998, AMIA.

[37] Peter Spyns. Natural Language Processing in Medicine: An Overview , 1996, Methods of Information in Medicine.

[38] A. L. Baker,et al. Performance of four computer-based diagnostic systems. , 1994, The New England journal of medicine.

[39] Florentina Hristea. Statistical Natural Language Processing , 2011, International Encyclopedia of Statistical Science.

[40] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[41] James H. Martin,et al. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[42] Dan Jurafsky,et al. Statistical Natural Language Processing , 2010, Encyclopedia of Machine Learning.

[43] Mark Hasegawa-Johnson,et al. Multivariate-state hidden Markov models for simultaneous transcription of phones and formants , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[44] Andrew J. Viterbi,et al. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[45] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[46] S C Kleene,et al. Representation of Events in Nerve Nets and Finite Automata , 1951 .

[47] David J. Weir,et al. The convergence of mildly context-sensitive grammar formalisms , 1990 .

[48] T C Rindflesch,et al. Ambiguity resolution while mapping free text to the UMLS Metathesaurus. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[49] M. Borodovsky,et al. GeneMark.hmm: new solutions for gene finding. , 1998, Nucleic acids research.

[50] Wendy W. Chapman,et al. Evaluation of negation phrases in narrative clinical reports , 2001, AMIA.

[51] Brian W. Kernighan,et al. The UNIX™ programming environment , 1979, Softw. Pract. Exp..

[52] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[53] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[54] Özlem Uzuner,et al. Viewpoint Paper: Recognizing Obesity and Comorbidities in Sparse Data , 2009, J. Am. Medical Informatics Assoc..

[55] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[56] Charles Elkan. Log-linear models and conditional random fields , 2007 .

[57] Sunghwan Sohn,et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[58] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .

[59] George Hripcsak,et al. Using empiric semantic correlation to interpret temporal assertions in clinical texts. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[60] Pierre Zweigenbaum,et al. Morphosemantic parsing of medical compound words: Transferring a French analyzer to English , 2009, Int. J. Medical Informatics.

[61] Randolph A. Miller,et al. Research Paper: Evaluation of a Method to Identify and Categorize Section Headers in Clinical Documents , 2009, J. Am. Medical Informatics Assoc..

[62] Jian Su,et al. Enhancing HMM-based biomedical named entity recognition by studying special phenomena , 2004, J. Biomed. Informatics.

[63] János Csirik,et al. The CoNLL-2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text , 2010, CoNLL Shared Task.

[64] Dan Klein,et al. Accurate Unlexicalized Parsing , 2003, ACL.

[65] Shuying Shen,et al. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[66] R A Miller,et al. The Demise of the “Greek Oracle” Model for Medical Diagnostic Systems , 1990, Methods of Information in Medicine.

[67] Wendy W. Chapman,et al. Anaphoric relations in the clinical narrative: corpus creation , 2011, J. Am. Medical Informatics Assoc..

[68] L. Tick,et al. Medical Language Processing: Applications to Patient Data Representation and Automatic Encoding , 1995, Methods of Information in Medicine.

[69] Carol Friedman,et al. Research Paper: Methods for Building Sense Inventories of Abbreviations in Clinical Notes , 2009, J. Am. Medical Informatics Assoc..

[70] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[71] Marc Weeber,et al. Developing a test collection for biomedical word sense disambiguation , 2001, AMIA.

[72] Xiaoyan Wang,et al. Characterizing environmental and phenotypic associations using information theory and electronic health records , 2009, BMC Bioinformatics.

[73] Dustin Boswell,et al. Introduction to Support Vector Machines , 2002 .

[74] Jonathan G. Goldin,et al. A concept-based retrieval system for thoracic radiology , 1996, Journal of Digital Imaging.

[75] Nello Cristianini,et al. An introduction to Support Vector Machines , 2000 .

[76] Graham Cormode,et al. Discrete methods in epidemiology , 2007 .

[77] Carol Friedman,et al. Extracting Phenotypic Information from the Literature via Natural Language Processing , 2004, MedInfo.

[78] Christopher G. Chute,et al. The horizontal and vertical nature of patient phenotype retrieval: new directions for clinical text processing , 2002, AMIA.

[79] Sean R. Eddy,et al. Pfam: multiple sequence alignments and HMM-profiles of protein domains , 1998, Nucleic Acids Res..