论文信息 - The automated retrieval console (ARC): open source software for streamlining the process of natural language processing

The automated retrieval console (ARC): open source software for streamlining the process of natural language processing

Open source natural language processing (NLP) frameworks have made it easier for NLP developers and researchers to develop more reusable and modular components and to capitalize on the work of others. With the Automated Retrieval Console (ARC) we attempt to build upon this foundation by streamlining the many processes surrounding the development, evaluation, and deployment of natural language processing technologies. Toward this end, ARC offers graphical user interfaces to facilitate corpus import, reference set creation, annotation, and inter-annotator agreement calculation. To speed task-specific information extraction development, ARC combines NLP-generated features from UIMA pipelines with machine learning classifiers and calculates performance statistics against a reference set. We also use ARC to explore automated algorithm creation for specific information extraction tasks in an effort to reduce the need for custom code and rules development. We present a detailed description of the ideas implemented in this proof-of-concept and a brief overview of two empirical evaluations.

Leonard W. D'Avolio | Thien M. Nguyen | Louis D. Fiore | L. Fiore | Thien-Minh Nguyen

[1] Carol Friedman,et al. Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[2] Scott T. Weiss,et al. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system , 2006, BMC Medical Informatics Decis. Mak..

[3] Leonard W. D'Avolio,et al. Evaluation of a generalizable approach to clinical information retrieval using the automated retrieval console (ARC) , 2010, J. Am. Medical Informatics Assoc..

[4] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[5] Iryna Gurevych,et al. Towards Enhanced Interoperability for Large HLT Systems : UIMA for NLP , 2008 .

[6] N Sager,et al. Computerized language processing: implications for health care evaluation. , 1978, Medical record news.

[7] Wendy W. Chapman,et al. Evaluation of negation phrases in narrative clinical reports , 2001, AMIA.

[8] David A. Ferrucci,et al. UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[9] James W. Cooper,et al. Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model , 2009, J. Biomed. Informatics.