What Happens When?: Interpreting Schedule of Activity Tables in Clinical Trial Documents

Clinical trial protocols are complex documents that must be translated manually for trial execution and management. We have developed a system to automatically transform a schedule of activity (SOA) table from a PDF document into a machine interpretable form. Our system combines semantic, structural, and NLP approaches with a "human in the loop" for verification to determine which cells contain activity or temporal information, and then to understand details of what these cells represent. Using a training and test set of 20 protocols, we assess the accuracy of identifying specific types of SOA elements. This work is the first stage of a larger effort to use artificial intelligence techniques to extract procedural logic in clinical trial documents and to create a knowledge base of protocols for insights and comparison across studies.

[1]  Kenneth A Getz,et al.  Quantifying the Magnitude and Cost of Collecting Extraneous Protocol Data , 2015, American journal of therapeutics.

[2]  H. Lan,et al.  SWRL : A semantic Web rule language combining OWL and ruleML , 2004 .

[3]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL 2006.

[4]  Robert M Califf,et al.  Clinical research sites--the underappreciated component of the clinical research system. , 2009, JAMA.

[5]  Erik T. Mueller,et al.  Watson: Beyond Jeopardy! , 2013, Artif. Intell..

[6]  Anupam Joshi,et al.  Automatic Extraction of Metrics from SLAs for Cloud Service Management , 2016, 2016 IEEE International Conference on Cloud Engineering (IC2E).

[7]  Constantine Bekas,et al.  Corpus Conversion Service: A Machine Learning Platform to Ingest Documents at Scale , 2018, ERCIM News.

[8]  Hillol Sarker,et al.  Enforcing Human Subject Regulations using Blockchain and Smart Contracts , 2018 .

[9]  Pankaj Mehra,et al.  Mining Business Contracts for Service Exceptions , 2012, IEEE Transactions on Services Computing.

[10]  Joel D. Martin,et al.  Automated Information Extraction of Key Trial Design Elements from Clinical Trial Publications , 2008, AMIA.

[11]  Chunhua Weng,et al.  Extracting temporal constraints from clinical research eligibility criteria using conditional random fields. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[12]  Martin J. O'Connor,et al.  An Ontology-based Architecture for Integration of Clinical Trials Management Applications , 2007, AMIA.

[13]  Kenneth A Getz,et al.  The Impact of Protocol Amendments on Clinical Trial Performance and Cost , 2016, Therapeutic innovation & regulatory science.

[14]  I. Cockburn,et al.  Price Indexes for Clinical Trial Research: A Feasibility Study , 2013 .

[15]  Tara Borlawsky,et al.  Evaluating an NLP-based approach to modeling computable clinical trial eligibility criteria. , 2007, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[16]  Sanjay Modgil,et al.  Decision support tools for clinical trial design , 2003, Artif. Intell. Medicine.

[17]  Cynna Selvy,et al.  Unified Medical Language System (UMLS) , 2015 .

[18]  Benjamin N. Grosof,et al.  Supporting Rule System Interoperability on the Semantic Web with SWRL , 2005, SEMWEB.

[19]  Joel D. Martin,et al.  ExaCT: automatic extraction of clinical trial characteristics from journal publications , 2010, BMC Medical Informatics Decis. Mak..

[20]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[21]  Kalina Bontcheva,et al.  Ontology-Based Information Extraction for Business Intelligence , 2007, ISWC/ASWC.

[22]  Michael N. Cantor,et al.  Analysis of eligibility criteria representation in industry-standard clinical trial protocols , 2013, J. Biomed. Informatics.