Process-Level Representation of Scientific Protocols with Interactive Annotation

We develop Process Execution Graphs (PEG), a document-level representation of real-world wet lab biochemistry protocols, addressing challenges such as cross-sentence relations, long-range coreference, grounding, and implicit arguments. We manually annotate PEGs in a corpus of complex lab protocols with a novel interactive textual simulator that keeps track of entity traits and semantic constraints during annotation. We use this data to develop graph-prediction models, finding them to be good at entity identification and local relation extraction, while our corpus facilitates further exploration of challenging long-range relations.1

[1]  Paul R Jaschke,et al.  Wet Lab Accelerator: A Web-Based Application Democratizing Laboratory Automation for Synthetic Biology. , 2017, ACS synthetic biology.

[2]  Hannaneh Hajishirzi,et al.  Entity, Relation, and Event Extraction with Contextualized Span Representations , 2019, EMNLP.

[3]  Omer Levy,et al.  Simulating Action Dynamics with Neural Process Networks , 2017, ICLR.

[4]  Bhavana Dalvi,et al.  Tracking State Changes in Procedural Text: a Challenge Dataset and Models for Process Paragraph Comprehension , 2018, NAACL.

[5]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[6]  Philipp Koehn,et al.  Abstract Meaning Representation for Sembanking , 2013, LAW@ACL.

[7]  Leroy Cronin,et al.  A universal system for digitization and automatic execution of the chemical synthesis literature , 2020, Science.

[8]  Romain Laroche,et al.  Learning Dynamic Belief Graphs to Generalize on Text-Based Games , 2020, NeurIPS.

[9]  Giorgio Satta,et al.  An Incremental Parser for Abstract Meaning Representation , 2016, EACL.

[10]  Nozomu Yachie,et al.  Robotic crowd biology with Maholo LabDroids , 2017, Nature Biotechnology.

[11]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing , 2000 .

[12]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[13]  Jeffrey Ling,et al.  Matching the Blanks: Distributional Similarity for Relation Learning , 2019, ACL.

[14]  Yejin Choi,et al.  Mise en Place: Unsupervised Interpretation of Instructional Recipes , 2015, EMNLP.

[15]  Kevin Knight,et al.  Smatch: an Evaluation Metric for Semantic Feature Structures , 2013, ACL.

[16]  O. Isayev Text mining facilitates materials discovery , 2019, Nature.

[17]  Makoto Miwa,et al.  Annotating and Extracting Synthesis Process of All-Solid-State Batteries from Scientific Literature , 2020, LREC.

[18]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[19]  Martha Palmer,et al.  AMR Beyond the Sentence: the Multi-sentence AMR corpus , 2018, COLING.

[20]  Bhavana Dalvi,et al.  A Dataset for Tracking Entities in Open Domain Procedural Text , 2020, EMNLP.

[21]  Ronen Tamari,et al.  Playing by the Book: An Interactive Game Approach for Action Graph Extraction from Text , 2018 .

[22]  Matthew J. Hausknecht,et al.  TextWorld: A Learning Environment for Text-based Games , 2018, CGW@IJCAI.

[23]  Raghu Machiraju,et al.  An Annotated Corpus for Machine Reading of Instructions in Wet Lab Protocols , 2018, NAACL.

[24]  Murray Hill,et al.  Lint, a C Program Checker , 1978 .

[25]  Le Song,et al.  Discriminative Embeddings of Latent Variable Models for Structured Data , 2016, ICML.

[26]  Mari Ostendorf,et al.  A general framework for information extraction using dynamic span graphs , 2019, NAACL.

[27]  Martha Palmer,et al.  PropBank: the Next Level of TreeBank , 2003 .

[28]  Andrew McCallum,et al.  The Materials Science Procedural Text Corpus: Annotating Materials Synthesis Procedures with Shallow Semantic Structures , 2019, LAW@ACL.

[29]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[30]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[31]  Ben Miles,et al.  Achieving Reproducibility and Closed-Loop Automation in Biological Experimentation with an IoT-Enabled Lab of the Future , 2018, SLAS technology.

[32]  Alain C. Vaucher,et al.  Automated extraction of chemical synthesis actions from experimental procedures , 2020, Nature Communications.

[33]  James H. Martin,et al.  Speech and Language Processing An Introduction to Natural Language Processing , Computational Linguistics , and Speech Recognition Second Edition , 2008 .

[34]  Gurpur Rakesh D. Prabhu,et al.  The dawn of unmanned analytical laboratories , 2017 .