Nominalization and Alternations in Biomedical Language

Background This paper presents data on alternations in the argument structure of common domain-specific verbs and their associated verbal nominalizations in the PennBioIE corpus. Alternation is the term in theoretical linguistics for variations in the surface syntactic form of verbs, e.g. the different forms of stimulate in FSH stimulates follicular development and follicular development is stimulated by FSH. The data is used to assess the implications of alternations for biomedical text mining systems and to test the fit of the sublanguage model to biomedical texts. Methodology/Principal Findings We examined 1,872 tokens of the ten most common domain-specific verbs or their zero-related nouns in the PennBioIE corpus and labelled them for the presence or absence of three alternations. We then annotated the arguments of 746 tokens of the nominalizations related to these verbs and counted alternations related to the presence or absence of arguments and to the syntactic position of non-absent arguments. We found that alternations are quite common both for verbs and for nominalizations. We also found a previously undescribed alternation involving an adjectival present participle. Conclusions/Significance We found that even in this semantically restricted domain, alternations are quite common, and alternations involving nominalizations are exceptionally diverse. Nonetheless, the sublanguage model applies to biomedical language. We also report on a previously undescribed alternation involving an adjectival present participle.

[1]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[2]  Richard Kittredge,et al.  Sublanguage : studies of language in restricted semantic domains , 1982 .

[3]  Naomi Sager,et al.  Chapter 2. Automatic Information Formatting of a Medical Sublanguage , 1982 .

[4]  Jan Svartvik,et al.  A __ comprehensive grammar of the English language , 1988 .

[5]  John Lehrberger,et al.  Automatic Translation and the Concept of Sublanguage , 1982 .

[6]  Tim Finin Constraining the interpretation of nominal compounds in a limited context , 1986 .

[7]  Marcia C. Linebarger,et al.  Recovering Implicit Information , 1986, ACL.

[8]  Ralph Grishman,et al.  Analyzing language in restricted domains : sublanguage description and processing , 1986 .

[9]  Deborah A. Dahl,et al.  Nominalizations in PUNDIT , 1987, ACL.

[10]  Z. Harris,et al.  Book Reviews: The Form of Information in Science: Analysis of an Immunology Sublanguage , 1989, CL.

[11]  B. Partee,et al.  Mathematical Methods in Linguistics , 1990 .

[12]  A. M. Ramer Mathematical Methods in Linguistics , 1992 .

[13]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[14]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[15]  J. Ko,et al.  Ki-ras codon 12 point mutational activation in Hong Kong colorectal carcinoma patients. , 1998, Cancer letters.

[16]  B. Boguraev,et al.  The Acquisition and Interpretation of Complex Nominals , 2022 .

[17]  D. Biber,et al.  Longman Grammar of Spoken and Written English , 1999 .

[18]  Nigel Collier,et al.  Building an Annotated Corpus in the Molecular-Biology Domain , 2000, SAIC@COLING.

[19]  Lawrence Hunter,et al.  Extracting Molecular Binding Relationships from Biomedical Text , 2000, ANLP.

[20]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[21]  Toshihisa Takagi,et al.  Automated extraction of information on protein-protein interactions from the biological literature , 2001, Bioinform..

[22]  Graeme Hirst,et al.  Book Reviews: Longman Grammar of Spoken and Written English , 2001, Computational Linguistics.

[23]  Carol Friedman,et al.  Two biomedical sublanguages: a description based on the theories of Zellig Harris , 2002, J. Biomed. Informatics.

[24]  James Pustejovsky,et al.  Robust Relational Parsing Over Biomedical Literature: Extracting Inhibit Relations , 2001, Pacific Symposium on Biocomputing.

[25]  Hsinchun Chen,et al.  Filling Preposition-Based Templates to Capture Information from Medical Abstracts , 2001, Pacific Symposium on Biocomputing.

[26]  Rodney Huddleston,et al.  Lexical Word-formation , 2002 .

[27]  H. Hughes The Cambridge Grammar of the English Language , 2003 .

[28]  Sergei Nirenburg,et al.  Automatic Translation and the Concept of Sublanguage , 2003 .

[29]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[30]  Christopher R. Johnson,et al.  Background to Framenet , 2003 .

[31]  Hsinchun Chen,et al.  A shallow parser based on closed-class words to capture relations in biomedical text , 2003, J. Biomed. Informatics.

[32]  Daniel Jurafsky,et al.  Parsing Arguments of Nominalizations in English and Chinese , 2004, HLT-NAACL.

[33]  Ralph Grishman,et al.  The NomBank Project: An Interim Report , 2004, FCP@NAACL-HLT.

[34]  Nigel Collier,et al.  PASBio: predicate-argument structures for event extraction in molecular biology , 2004, BMC Bioinformatics.

[35]  David Heath,et al.  A valency dictionary of English: a corpus-based analysis of the complementation patterns of English verbs, nouns and adjectives , 2004 .

[36]  Miriam Butt,et al.  The Projection of Arguments: Lexical and Compositional Factors , 2004 .

[37]  Adam Meyers,et al.  NP-External Arguments: A Study of Argument Sharing in English , 2004 .

[38]  Yuka Tateisi,et al.  Annotation of Predicate-argument Structure on Molecular Biology Text , 2004 .

[39]  Ralph Grishman,et al.  Annotating Noun Argument Structure for NomBank , 2004, LREC.

[40]  Seth Kulick,et al.  Integrated Annotation for Biomedical Information Extraction , 2004, HLT-NAACL 2004.

[41]  K. E. Ravikumar,et al.  Beyond the clause: extraction of phosphorylation information from medline abstracts , 2005, ISMB.

[42]  Hsinchun Chen,et al.  Genescene: An ontology-enhanced integration of linguistic and co-occurrence based relations in biomedical texts , 2005, J. Assoc. Inf. Sci. Technol..

[43]  Peer Bork,et al.  Extraction of Transcript Diversity from Scientific Literature , 2005, PLoS Comput. Biol..

[44]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[45]  Michael Krauthammer,et al.  Towards Semantic Role Labeling & IE in the Medical Literature , 2005, AMIA.

[46]  K. E. Ravikumar,et al.  Literature mining and database annotation of protein phosphorylation using a rule-based system , 2005, Bioinform..

[47]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[48]  Hwee Tou Ng,et al.  Semantic Role Labeling of NomBank: A Maximum Entropy Approach , 2006, EMNLP.

[49]  Timothy Baldwin,et al.  Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006 , 2006 .

[50]  Sabine Bergler,et al.  Postnominal Prepositional Phrase Attachment in Proteomics , 2006, BioNLP@NAACL-HLT.

[51]  Wen-Lian Hsu,et al.  A Semi-Automatic Method for Annotating a Biomedical Proposition Bank , 2006 .

[52]  K. Bretonnel Cohen,et al.  A critical review of PASBio's argument structures for biomedical verbs , 2006, BMC Bioinformatics.

[53]  Ben Goertzel,et al.  Using Dependency Parsing and Probabilistic Inference to Extract Relationships between Genes, Proteins and Malignancies Implicit Among Multiple Biomedical Research Abstracts , 2006, BioNLP@NAACL-HLT.

[54]  Kevin Bretonnel Cohen,et al.  Introduction to BioNLP'06 , 2006, HLT-NAACL 2006.

[55]  K. E. Ravikumar,et al.  An online literature mining tool for protein phosphorylation , 2006, Bioinform..

[56]  Wen-Lian Hsu,et al.  BIOSMILE: Adapting Semantic Role Labeling for Biomedical Verbs: , 2006, BioNLP@NAACL-HLT.

[57]  Philip V. Ogren,et al.  Knowtator: A Protégé plug-in for annotated corpus construction , 2006, NAACL.

[58]  K. Bretonnel Cohen,et al.  Frontiers of biomedical text mining: current progress , 2007, Briefings Bioinform..

[59]  Adam Meyers Annotation Guidelines for NomBank ñ Noun Argument Structure for PropBank 2007 , 2007 .

[60]  David M. Holloway,et al.  Spatial Bistability Generates hunchback Expression Sharpness in the Drosophila Embryo , 2008, PLoS Comput. Biol..

[61]  K. Bretonnel Cohen,et al.  Getting Started in Text Mining , 2008, PLoS Comput. Biol..

[62]  Anna Maria Henrica van Hout,et al.  The representation of movement in –ability nominalizations: Evidence for covert category movement, Edge phenomena, and local LF , 2009 .