The automated retrieval console (ARC): open source software for streamlining the process of natural language processing

Open source natural language processing (NLP) frameworks have made it easier for NLP developers and researchers to develop more reusable and modular components and to capitalize on the work of others. With the Automated Retrieval Console (ARC) we attempt to build upon this foundation by streamlining the many processes surrounding the development, evaluation, and deployment of natural language processing technologies. Toward this end, ARC offers graphical user interfaces to facilitate corpus import, reference set creation, annotation, and inter-annotator agreement calculation. To speed task-specific information extraction development, ARC combines NLP-generated features from UIMA pipelines with machine learning classifiers and calculates performance statistics against a reference set. We also use ARC to explore automated algorithm creation for specific information extraction tasks in an effort to reduce the need for custom code and rules development. We present a detailed description of the ideas implemented in this proof-of-concept and a brief overview of two empirical evaluations.