Advances in Machine Learning and Data Mining for Astronomy

Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The books introductory part provides context to issues in the astronomical sciences that are also important to health, social, and physical sciences, particularly probabilistic and statistical aspects of classification and cluster analysis. The next part describes a number of astrophysics case studies that leverage a range of machine learning and data mining technologies. In the last part, developers of algorithms and practitioners of machine learning and data mining show how these tools and techniques are used in astronomical applications. With contributions from leading astronomers and computer scientists, this book is a practical guide to many of the most important developments in machine learning, data mining, and statistics. It explores how these advances can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.

[1]  Gareth J. F. Jones,et al.  Adaptation of machine translation for multilingual information retrieval in the medical domain , 2014, Artif. Intell. Medicine.

[2]  Vasileios Hatzivassiloglou,et al.  Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[3]  Carl J. Schmidt,et al.  Mining the Biomedical Literature for Genic Information , 2008, BioNLP.

[4]  Yue Wang,et al.  The Genia Event Extraction Shared Task, 2013 Edition - Overview , 2013, BioNLP@ACL.

[5]  Cathy H. Wu,et al.  The eFIP system for text mining of protein interaction networks of phosphorylated proteins , 2012, Database J. Biol. Databases Curation.

[6]  W. Bruce Croft,et al.  Phrasal translation and query expansion techniques for cross-language information retrieval , 1997, SIGIR '97.

[7]  Jin-Dong Kim,et al.  Exploring Domain Differences for the Design of a Pronoun Resolution System for Biomedical Text , 2008, COLING.

[8]  Nigel Collier,et al.  Zone analysis in biology articles as a basis for information extraction , 2006, Int. J. Medical Informatics.

[9]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[10]  Carole A. Goble,et al.  BioCatalogue: a universal catalogue of web services for the life sciences , 2010, Nucleic Acids Res..

[11]  A. Banday,et al.  Mining the Sky , 2001 .

[12]  B. Zuckerman,et al.  Errors in medical interpretation and their potential clinical consequences in pediatric encounters. , 2003, Pediatrics.

[13]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[14]  Lorraine K. Tanabe,et al.  GENETAG: a tagged corpus for gene/protein named entity recognition , 2005, BMC Bioinformatics.

[15]  Sophia Ananiadou,et al.  Analysing Entity Type Variation across Biomedical Subdomains , 2012 .

[16]  Zhiyong Lu,et al.  PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..

[17]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[18]  Thomas S. Morton,et al.  WordFreak: An Open Tool for Linguistic Annotation , 2003, HLT-NAACL.

[19]  S. Y. Fung,et al.  Factors associated with breast self-examination behaviour among Chinese women in Hong Kong. , 1998, Patient education and counseling.

[20]  Yuan Luo,et al.  Identifying patient smoking status from medical discharge records. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[21]  Sophia Ananiadou,et al.  Deploying and sharing U-Compare workflows as web services , 2013, Journal of Biomedical Semantics.

[22]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[23]  Alexander A. Morgan,et al.  BioCreAtIvE Task 1A: gene mention finding evaluation , 2005, BMC Bioinformatics.

[24]  Margaret-Anne D. Storey,et al.  An Interactive Tool for Visualizing Design Heterogeneity in Clinical Trials , 2008, AMIA.

[25]  Hiroshi Nakagawa Automatic term recognition based on statistics of compound nouns , 2000 .

[26]  Sayori Shimohata,et al.  Retrieving Collocations by Co-Occurrences and Word Order Constraints , 1997, ACL.

[27]  Sophia Ananiadou,et al.  Construction of an annotated corpus to support biomedical information extraction , 2009, BMC Bioinformatics.

[28]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[29]  Textpresso text mining : semi-automated curation of protein subcellular localization using the Gene Ontology ’ s Cellular Component Ontology , 2012 .

[30]  Martijn J. Schuemie,et al.  Distribution of information in biomedical abstracts and full-text publications , 2004, Bioinform..

[31]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[32]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[33]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[34]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[35]  Hans-Michael Müller,et al.  Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.

[36]  Manabu Torii,et al.  SORTAL ANAPHORA RESOLUTION IN MEDLINE ABSTRACTS , 2007, Comput. Intell..

[37]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[38]  Sophia Ananiadou,et al.  NaCTeM EventMine for BioNLP 2013 CG and PC tasks , 2013, BioNLP@ACL.

[39]  Kalpana Raja,et al.  PPInterFinder—a mining tool for extracting causal relations on human proteins from literature , 2013, Database J. Biol. Databases Curation.

[40]  Ruth L. Seal,et al.  Annotation of anaphoric relations in biomedical full-text articles using a domain-relevant scheme , 2007 .

[41]  Simone Teufel,et al.  Argumentative zoning information extraction from scientific text , 1999 .

[42]  Dietrich Rebholz-Schuhmann,et al.  Gene Regulation Ontology (GRO): Design Principles and Use Cases , 2008, MIE.

[43]  Christoph Müller,et al.  Multi-level annotation of linguistic data with MMAX 2 , 2006 .

[44]  Michael Strube,et al.  End-to-End Coreference Resolution via Hypergraph Partitioning , 2010, COLING.

[45]  Sebastian Martschat,et al.  Multigraph Clustering for Unsupervised Coreference Resolution , 2013, ACL.

[46]  Jun'ichi Tsujii,et al.  Corpus annotation for mining biomedical events from literature , 2008, BMC Bioinformatics.

[47]  James R. Curran,et al.  Challenges for automatically extracting molecular interactions from full-text articles , 2009, BMC Bioinformatics.

[48]  James Evans,et al.  Solar Anomaly and Planetary Displays in the Antikythera Mechanism , 2010 .

[49]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[50]  Adam Wright,et al.  An automated technique for identifying associations between medications, laboratory results and problems , 2010, J. Biomed. Informatics.

[51]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[52]  Jian Su,et al.  An NP-Cluster Based Approach to Coreference Resolution , 2004, COLING.

[53]  Hong Yu,et al.  The biomedical discourse relation bank , 2011, BMC Bioinformatics.

[54]  Alla Keselman,et al.  Can Multilingual Machine Translation Help Make Medical Record Content More Comprehensible to Patients? , 2010, MedInfo.

[55]  Emmanuel Morin,et al.  Revising the Compositional Method for Terminology Acquisition from Comparable Corpora , 2012, COLING.

[56]  Holger Schwenk,et al.  Parallel sentence generation from comparable corpora for improved SMT , 2011, Machine Translation.

[57]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[58]  Beatrice Santorini,et al.  The Penn Treebank: An Overview , 2003 .

[59]  Joel D. Martin,et al.  PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine , 2003, BMC Bioinformatics.

[60]  Halil Kilicoglu,et al.  A High-Precision Approach to Detecting Hedges and their Scopes , 2010, CoNLL Shared Task.

[61]  Sophia Ananiadou,et al.  How to make the most of NE dictionaries in statistical NER , 2008, BMC Bioinformatics.

[62]  Alexander A. Morgan,et al.  Gene Name Extraction Using FlyBase Resources , 2003, BioNLP@ACL.

[63]  Breck Baldwin,et al.  Description of the UPENN CAMP System as Used for Coreference , 1998, MUC.

[64]  Daniel Jurafsky,et al.  Is Knowledge-Free Induction of Multiword Unit Dictionary Headwords a Solved Problem? , 2001, EMNLP.

[65]  George Hripcsak,et al.  Automating a severity score guideline for community-acquired pneumonia employing medical language processing of discharge summaries , 1999, AMIA.

[66]  Proux,et al.  Detecting Gene Symbols and Names in Biological Texts: A First Step toward Pertinent Information Extraction. , 1998, Genome informatics. Workshop on Genome Informatics.

[67]  Jun'ichi Tsujii,et al.  Part-of-Speech Annotation of Biology Research Abstracts , 2004, LREC.

[68]  Ted Briscoe,et al.  Weakly Supervised Learning for Hedge Classification in Scientific Literature , 2007, ACL.

[69]  Dietrich Rebholz-Schuhmann,et al.  Using argumentation to extract key sentences from biomedical abstracts , 2007, Int. J. Medical Informatics.

[70]  Sophia Ananiadou,et al.  Discovering and visualizing indirect associations between biomedical concepts , 2011, Bioinform..

[71]  Max Mühlhäuser,et al.  Darmstadt Knowledge Processing Repository Based on UIMA , 2007 .

[72]  Thomas C. Wiegers,et al.  Text Mining Effectively Scores and Ranks the Literature for Improving Chemical-Gene-Disease Curation at the Comparative Toxicogenomics Database , 2013, PloS one.

[73]  Ted Briscoe,et al.  Integrating Natural Language Processing with Flybase Curation , 2006, Pacific Symposium on Biocomputing.

[74]  Sophia Ananiadou,et al.  Extending an interoperable platform to facilitate the creation of multilingual and multimodal NLP applications , 2013, ACL.

[75]  Sophia Ananiadou,et al.  Extracting semantically enriched events from biomedical literature , 2012, BMC Bioinformatics.

[76]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[77]  L M Lau,et al.  A natural language understanding system combining syntactic and semantic techniques. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[78]  Naoaki Okazaki,et al.  Automatic Acquisition of Huge Training Data for Bio-Medical Named Entity Recognition , 2011, BioNLP@ACL.

[79]  Alexander A. Morgan,et al.  Overview of BioCreAtIvE task 1B: normalized gene lists , 2005, BMC Bioinformatics.

[80]  Zhiyong Lu,et al.  NCBI disease corpus: A resource for disease name recognition and concept normalization , 2014, J. Biomed. Informatics.

[81]  Sophia Ananiadou,et al.  Recognising Discourse Causality Triggers in the Biomedical Domain , 2013, J. Bioinform. Comput. Biol..

[82]  Sophia Ananiadou,et al.  Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora , 2014, EACL.

[83]  Salvador Capella-Gutiérrez,et al.  PhylomeDB v3.0: an expanding repository of genome-wide collections of trees, alignments and phylogeny-based orthology and paralogy predictions , 2010, Nucleic Acids Res..

[84]  Yang Huang,et al.  Combining text classification and Hidden Markov Modeling techniques for categorizing sentences in randomized clinical trial abstracts. , 2006, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[85]  Naoaki Okazaki,et al.  Identifying Sections in Scientific Abstracts using Conditional Random Fields , 2008, IJCNLP.

[86]  Emmanuel Morin,et al.  Compositionality and lexical alignment of multi-word terms , 2010, Lang. Resour. Evaluation.

[87]  Jari Björne,et al.  TEES 2.1: Automated Annotation Scheme Learning in the BioNLP 2013 Shared Task , 2013, BioNLP@ACL.

[88]  Sophia Ananiadou,et al.  Building a semantically annotated corpus for congestive heart and renal failure from clinical records and the literature , 2014, Louhi@EACL.

[89]  Sophia Ananiadou,et al.  Chemistry-specific Features and Heuristics for Developing a CRF-based Chemical Named Entity Recogniser , 2013 .

[90]  Sampo Pyysalo,et al.  Event extraction across multiple levels of biological organization , 2012, Bioinform..

[91]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[92]  Sophia Ananiadou,et al.  Customisable Curation Workflows in Argo , 2013 .

[93]  Grace Yuet-Chee Chung,et al.  Sentence retrieval for abstracts of randomized controlled trials , 2009, BMC Medical Informatics Decis. Mak..

[94]  Fei Xia,et al.  Corpus Annotation : Challenges and Strategies , 2012 .

[95]  Miguel A. Andrade-Navarro,et al.  Information extraction from full text scientific articles: Where are the keywords? , 2003, BMC Bioinformatics.

[96]  Keun Ho Ryu,et al.  An Active Co-Training Algorithm for Biomedical Named-Entity Recognition , 2012, J. Inf. Process. Syst..

[97]  Sampo Pyysalo,et al.  BioCause: Annotating and analysing causality in the biomedical domain , 2013, BMC Bioinformatics.

[98]  Chris Callison-Burch,et al.  Combining Bilingual and Comparable Corpora for Low Resource Machine Translation , 2013, WMT@ACL.

[99]  Fei Xia,et al.  Preliminary Experiments with Amazon’s Mechanical Turk for Annotating Medical Named Entities , 2010, Mturk@HLT-NAACL.

[100]  Burkhard Rost,et al.  Protein names precisely peeled off free text , 2004, ISMB/ECCB.

[101]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[102]  L. M. Faltz,et al.  Boolean semantics for natural language , 1984 .

[103]  Kenneth Ward Church,et al.  Termight: Identifying and Translating Technical Terminology , 1994, ANLP.

[104]  Sophia Ananiadou,et al.  Making UIMA Truly Interoperable with SPARQL , 2013, LAW@ACL.

[105]  Anupam Basu,et al.  An Agreement Measure for Determining Inter-Annotator Reliability of Human Judgements on Affective Text , 2008, Proceedings of the Workshop on Human Judgements in Computational Linguistics - HumanJudge '08.

[106]  Jun'ichi Tsujii,et al.  Improving the performance of dictionary-based approaches in protein name recognition , 2004, J. Biomed. Informatics.

[107]  A. Valencia,et al.  Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge , 2008, Genome Biology.

[108]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[109]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[110]  K. Hyland,et al.  Writing Without Conviction? Hedging in Science Research Articles , 1996 .

[111]  Dietrich Rebholz-Schuhmann,et al.  Text processing through Web services: calling Whatizit , 2008, Bioinform..

[112]  Andrew McCallum,et al.  Robust Biomedical Event Extraction with Dual Decomposition and Minimal Domain Adaptation , 2011, BioNLP@ACL.

[113]  Zhiyong Lu,et al.  NCBI at 2013 ShARe/CLEF eHealth Shared Task: Disorder Normalization in Clinical Notes with Dnorm , 2013, CLEF.

[114]  Dina Demner-Fushman,et al.  Biomedical Text Mining: A Survey of Recent Progress , 2012, Mining Text Data.

[115]  Dongseop Kwon,et al.  BioQRator : a web-based interactive biomedical literature curating system , 2013 .

[116]  Teruyoshi Hishiki,et al.  Extraction of Gene-Disease Relations from Medline Using Domain Dictionaries and Machine Learning , 2005, Pacific Symposium on Biocomputing.

[117]  Joel D. Martin,et al.  ExaCT: automatic extraction of clinical trial characteristics from journal publications , 2010, BMC Medical Informatics Decis. Mak..

[118]  Sophia Ananiadou,et al.  Meta-Knowledge Annotation at the Event Level: Comparison between Abstracts and Full Papers , 2012, LREC 2012.

[119]  Seth Kulick,et al.  Integrated Annotation for Biomedical Information Extraction , 2004, HLT-NAACL 2004.

[120]  Sampo Pyysalo,et al.  Overview of the Infectious Diseases (ID) task of BioNLP Shared Task 2011 , 2011, BioNLP@ACL.

[121]  K Bretonnel Cohen,et al.  Journal of Biomedical Discovery and Collaboration Open Access an Open-source Framework for Large-scale, Flexible Evaluation of Biomedical Text Mining Systems , 2008 .

[122]  Ted Briscoe,et al.  Combining Manual Rules and Supervised Learning for Hedge Cue and Scope Detection , 2010, CoNLL Shared Task.

[123]  Jun'ichi Tsujii,et al.  New challenges for text mining: mapping between text and manually curated pathways , 2008, BMC Bioinformatics.

[124]  Jeyakumar Natarajan,et al.  An overview of the BioCreative 2012 Workshop Track III: interactive text mining task , 2013, Database J. Biol. Databases Curation.

[125]  George Hripcsak,et al.  Measuring agreement in medical informatics reliability studies , 2002, J. Biomed. Informatics.

[126]  Tapio Salakoski,et al.  EVEX: A PubMed-Scale Resource for Homology-Based Generalization of Text Mining Predictions , 2011, BioNLP@ACL.

[127]  K. Bretonnel Cohen,et al.  U-Compare: A modular NLP workflow construction and evaluation system , 2011, IBM J. Res. Dev..

[128]  Sampo Pyysalo,et al.  Overview of the Pathway Curation (PC) task of BioNLP Shared Task 2013 , 2013, BioNLP@ACL.

[129]  Sampo Pyysalo,et al.  A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text , 2013, Bioinform..

[130]  Pim van der Eijk Automating the Acquisition of Bilingual Terminology , 1993, EACL.

[131]  Stephan Oepen,et al.  Resolving Speculation: MaxEnt Cue Classification and Dependency-Based Scope Rules , 2010, CoNLL Shared Task.

[132]  Sophia Ananiadou,et al.  Negated bio-events: analysis and identification , 2013, BMC Bioinformatics.

[133]  Anna Korhonen,et al.  Exploring subdomain variation in biomedical language , 2010, BMC Bioinformatics.

[134]  César de Pablo-Sánchez,et al.  Resolving anaphoras for the extraction of drug-drug interactions in pharmacological documents , 2010, BMC Bioinformatics.

[135]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[136]  P J Haug,et al.  Development and evaluation of a computerized admission diagnoses encoding system. , 1996, Computers and biomedical research, an international journal.

[137]  Hong Yu,et al.  Automatic discourse connective detection in biomedical text , 2012, J. Am. Medical Informatics Assoc..

[138]  Sophia Ananiadou,et al.  A data and analysis resource for an experiment in text mining a collection of micro-blogs on a political topic , 2012, LREC.

[139]  Sophia Ananiadou,et al.  A hybrid approach to recognising discourse causality in the biomedical domain , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[140]  Sampo Pyysalo,et al.  EXTRACTING BIO‐MOLECULAR EVENTS FROM LITERATURE—THE BIONLP’09 SHARED TASK , 2011, Comput. Intell..

[141]  Hong Yu,et al.  Biomedical negation scope detection with conditional random fields , 2010, J. Am. Medical Informatics Assoc..

[142]  Claire Grover,et al.  The ITI TXM Corpora: Tissue Expressions and Protein-Protein Interactions , 2008 .

[143]  Padmini Srinivasan,et al.  Categorization of Sentence Types in Medical Abstracts , 2003, AMIA.

[144]  Elaine Marsh,et al.  MUC-7 Evaluation of IE Technology: Overview of Results , 1998, MUC.

[145]  Alexander M. Fraser,et al.  Improved Machine Translation Performance via Parallel Sentence Extraction from Comparable Corpora , 2004, NAACL.

[146]  K. Bretonnel Cohen,et al.  Getting Started in Text Mining , 2008, PLoS Comput. Biol..

[147]  Robert E. Mercer,et al.  Identifying Explicit Discourse Connectives in Text , 2013, Canadian Conference on AI.

[148]  Sophia Ananiadou,et al.  Something Old, Something New: Identifying Knowledge Source in Bio-events , 2013, Int. J. Comput. Linguistics Appl..

[149]  Sophia Ananiadou,et al.  Adapting the Cluster Ranking Supervised Model to Resolve Coreferences in the Drug Literature , 2011, LBM 2011.

[150]  Anna Korhonen,et al.  Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review , 2013, Bioinform..

[151]  Philip V. Ogren,et al.  Knowtator: A Protégé plug-in for annotated corpus construction , 2006, NAACL.

[152]  Damian Szklarczyk,et al.  STITCH 3: zooming in on protein–chemical interactions , 2011, Nucleic Acids Res..

[153]  Leen Breure,et al.  Modeling Rhetoric in Scientific Publications , 2008 .

[154]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[155]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[156]  Kalina Bontcheva,et al.  Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics , 2013, PLoS Comput. Biol..

[157]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[158]  Michael F. Lynch,et al.  Extraction of Information from the Text of Chemical Patents. 1. Identification of Specific Chemical Names , 1998, J. Chem. Inf. Comput. Sci..

[159]  Michael Krauthammer,et al.  Shallow Semantic Parsing of Randomized Controlled Trial Reports , 2006, AMIA.

[160]  Erik M. van Mulligen,et al.  A fast rule-based approach for biomedical event extraction , 2013, BioNLP@ACL.

[161]  Armando Blanco,et al.  Collaborative text-annotation resource for disease-centered relation extraction from biomedical text , 2009, J. Biomed. Informatics.

[162]  Anna Korhonen,et al.  Using Argumentative Zones for Extractive Summarization of Scientific Articles , 2012, COLING.

[163]  Martin Hofmann-Apitius,et al.  Detection of IUPAC and IUPAC-like chemical names , 2008, ISMB.

[164]  Scott T. Weiss,et al.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system , 2006, BMC Medical Informatics Decis. Mak..

[165]  Russ B. Altman,et al.  Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text , 2009, BMC Bioinformatics.

[166]  Emmanuel Morin,et al.  Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora , 2011, BUCC@ACL.

[167]  Vassiliki Rizomilioti Exploring Epistemic Modality in Academic Discourse Using Corpora , 2006 .

[168]  Catalina O. Tudor,et al.  BioCreative IV Interactive Task , 2013 .

[169]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[170]  Wen-Lian Hsu,et al.  A Survey of State of the Art Biomedical Text Mining Techniques for Semantic Analysis , 2008, 2008 IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing (sutc 2008).

[171]  Jun'ichi Tsujii,et al.  Adapting a Probabilistic Disambiguation Model of an HPSG Parser to a New Domain , 2005, IJCNLP.

[172]  Joel D. Martin,et al.  Automated Information Extraction of Key Trial Design Elements from Clinical Trial Publications , 2008, AMIA.

[173]  János Csirik,et al.  The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes , 2008, BMC Bioinformatics.

[174]  Sophia Ananiadou,et al.  Identification of Manner in Bio-Events , 2012, LREC.

[175]  Kyo Kageura,et al.  METHODS OF AUTOMATIC TERM RECOGNITION : A REVIEW , 1996 .

[176]  Halil Kilicoglu,et al.  Syntactic Dependency Based Heuristics for Biological Event Extraction , 2009, BioNLP@HLT-NAACL.

[177]  Marcelo Tallis,et al.  Triage with the SciKnowMine System in the Mouse Genome Informatics ( MGI ) Curation Process , 2013 .

[178]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[179]  Sophia Ananiadou,et al.  Enriching a biomedical event corpus with meta-knowledge annotation , 2011, BMC Bioinformatics.

[180]  Kalina BontchevaHamish,et al.  Universities of Leeds, Sheffield and York , 2022 .

[181]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[182]  Nigel Collier,et al.  Extracting the Names of Genes and Gene Products with a Hidden Markov Model , 2000, COLING.

[183]  Xiaoqiang Luo,et al.  On Coreference Resolution Performance Metrics , 2005, HLT.

[184]  Kenneth Ward Church,et al.  Robust Bilingual Word Alignment for Machine Aided Translation , 1993, VLC@ACL.

[185]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[186]  Steven J. Marygold,et al.  tagtog : Interactive Human and Machine Annotation of Gene Mentions in PLOS Full-Text Articles , 2013 .

[187]  Ron D. Appel,et al.  ExPASy: the proteomics server for in-depth protein knowledge and analysis , 2003, Nucleic Acids Res..

[188]  Mariana L. Neves,et al.  Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts , 2013, Database J. Biol. Databases Curation.

[189]  Halil Kilicoglu,et al.  Recognizing speculative language in biomedical research articles: a linguistically motivated perspective , 2008, BMC Bioinformatics.

[190]  Sophia Ananiadou,et al.  A three-way perspective on scientific discourse annotation for knowledge extraction , 2012, ACL 2012.

[191]  Roser Morante,et al.  Memory-Based Resolution of In-Sentence Scopes of Hedge Cues , 2010, CoNLL Shared Task.

[192]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[193]  EHARA Terumasa,et al.  Rule based machine translation combined with statistical post editor for Japanese to English patent translation , 2007, MTSUMMIT.

[194]  Claire Lemaire,et al.  Extraction of Domain-Specific Bilingual Lexicon from Comparable Corpora: Compositional Translation and Ranking , 2012, COLING.

[195]  Ted Briscoe,et al.  Statistical Anaphora Resolution in Biomedical Texts , 2008, COLING.

[196]  Mihai Surdeanu,et al.  Event Extraction as Dependency Parsing , 2011, ACL.

[197]  S Shiffman,et al.  A free-text processing system to capture physical findings: Canonical Phrase Identification System (CAPIS). , 1991, Proceedings. Symposium on Computer Applications in Medical Care.

[198]  Jun'ichi Tsujii,et al.  Tuning support vector machines for biomedical named entity recognition , 2002, ACL Workshop on Natural Language Processing in the Biomedical Domain.

[199]  Hal Daumé,et al.  Domain Adaptation for Machine Translation by Mining Unseen Words , 2011, ACL.

[200]  Ralph Grishman,et al.  Adaptive Information Extraction and Sublanguage Analysis , 2001 .

[201]  Gary D Bader,et al.  BIND--The Biomolecular Interaction Network Database. , 2001, Nucleic acids research.

[202]  Sophia Ananiadou,et al.  Text mining and its potential applications in systems biology. , 2006, Trends in biotechnology.

[203]  James Pustejovsky,et al.  FactBank: a corpus annotated with event factuality , 2009, Lang. Resour. Evaluation.

[204]  Cathy H. Wu,et al.  RLIMS-P : Literature-based curation of protein phosphorylation information , 2013 .

[205]  Michael Krauthammer,et al.  Term identification in the biomedical literature , 2004, J. Biomed. Informatics.

[206]  Paloma Martínez,et al.  A linguistic rule-based approach to extract drug-drug interactions from pharmacological documents , 2011, BMC Bioinformatics.

[207]  Hsinchun Chen,et al.  Disease named entity recognition using semisupervised learning and conditional random fields , 2011, J. Assoc. Inf. Sci. Technol..

[208]  Karin M. Verspoor,et al.  BioC: a minimalist approach to interoperability for biomedical text processing , 2013, AMIA.

[209]  Vincent Ng,et al.  Supervised Models for Coreference Resolution , 2009, EMNLP.

[210]  Zhiyong Lu,et al.  DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[211]  Andrew McCallum,et al.  Combining joint models for biomedical event extraction , 2012, BMC Bioinformatics.

[212]  Christian Boitet,et al.  Automated Translation at Grenoble University , 1985, Comput. Linguistics.

[213]  Horacio Rodríguez,et al.  Improving Term Extraction by System Combination Using Boosting , 2001, ECML.

[214]  Xuan Wang,et al.  Exploiting Rich Features for Detecting Hedges and their Scope , 2010, CoNLL Shared Task.

[215]  Ágnes Sándor,et al.  Modeling metadiscourse conveying the author's rhetorical strategy in biomedical research abstracts , 2007 .

[216]  Sophia Ananiadou,et al.  Boosting automatic event extraction from the literature using domain adaptation and coreference resolution , 2012, Bioinform..

[217]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[218]  Yu-Hsiang Lin,et al.  Anaphora Resolution for Biomedical Literature by Exploiting Multiple Resources , 2005, IJCNLP.

[219]  Graciela Gonzalez,et al.  BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[220]  Veronika Vincze,et al.  Linguistic scope-based and biological event-based speculation and negation annotations in the BioScope and Genia Event corpora , 2011, J. Biomed. Semant..

[221]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[222]  Jong C. Park,et al.  BioAR: Anaphora Resolution for Relating Protein Names to Proteome Database Entries , 2004 .

[223]  Bart Barlogie,et al.  Bortezomib induces osteoblast differentiation via Wnt-independent activation of beta-catenin/TCF signaling. , 2008, Blood.

[224]  K. Bretonnel Cohen,et al.  Concept annotation in the CRAFT corpus , 2012, BMC Bioinformatics.

[225]  Hyon B. Shin,et al.  Language Use and English-Speaking Ability: 2000. Census 2000 Brief. , 2003 .

[226]  Patrick Drouin,et al.  Term extraction using non-technical corpora as a point of leverage , 2003 .

[227]  Anna Korhonen,et al.  Weakly supervised learning of information structure of scientific abstracts - is it accurate enough to benefit real-world tasks in biomedicine? , 2011, Bioinform..

[228]  Ulf Leser,et al.  ALIBABA: PubMed as a graph , 2006, Bioinform..

[229]  Kimberly Van Auken,et al.  Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation , 2009, BMC Bioinformatics.

[230]  Fernando Pereira,et al.  Identifying gene and protein mentions in text using conditional random fields , 2005, BMC Bioinformatics.

[231]  Ioannis Korkontzelos,et al.  Unsupervised learning of multiword expressions , 2010 .

[232]  Alexander A. Morgan,et al.  Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup , 2003, ISMB.

[233]  H. Lehmann,et al.  Clinical Decision Support Systems (cdsss) Have Been Hailed for Their Potential to Reduce Medical Errors Clinical Decision Support Systems for the Practice of Evidence-based Medicine , 2022 .

[234]  Jari Björne,et al.  Large-Scale Event Extraction from Literature with Multi-Level Gene Normalization , 2013, PloS one.

[235]  Jari Björne,et al.  Extracting Complex Biological Events with Rich Graph-Based Feature Sets , 2009, BioNLP@HLT-NAACL.

[236]  Zhiyong Lu,et al.  Semi-automatic semantic annotation of PubMed queries: A study on quality, efficiency, satisfaction , 2011, J. Biomed. Informatics.

[237]  Hagit Shatkay,et al.  Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users , 2008, Bioinform..

[238]  K. Hyland,et al.  Talking to the Academy , 1996 .

[239]  Sophia Ananiadou,et al.  Building a Coreference-Annotated Corpus from the Domain of Biochemistry , 2011, BioNLP@ACL.

[240]  Tingting Mu,et al.  ASCOT: a text mining-based web-service for efficient search and assisted creation of clinical trials , 2012, BMC Medical Informatics and Decision Making.

[241]  Fabio Rinaldi,et al.  ODIN: a customizable literature curation tool , 2013 .

[242]  Sophia Ananiadou,et al.  Argo: an integrative, interactive, text mining-based workbench supporting curation , 2012, Database J. Biol. Databases Curation.

[243]  Fei Xia,et al.  Statistical machine translation for biomedical text: are we there yet? , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[244]  Dietrich Rebholz-Schuhmann,et al.  Automatic recognition of conceptualization zones in scientific articles and two life science applications , 2012, Bioinform..

[245]  Özlem Uzuner,et al.  Viewpoint Paper: Recognizing Obesity and Comorbidities in Sparse Data , 2009, J. Am. Medical Informatics Assoc..

[246]  David W. Embley,et al.  Generating Medical Logic Modules for Clinical Trial Eligibility Criteria , 2003, AMIA.

[247]  Alfonso Valencia,et al.  Implementing the iHOP concept for navigation of biomedical literature , 2005, ECCB/JBI.

[248]  William R. Hersh,et al.  A survey of current work in biomedical text mining , 2005, Briefings Bioinform..

[249]  Pierre Zweigenbaum,et al.  Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora , 2002, COLING.

[250]  Jari Björne,et al.  Semantically linking molecular entities in literature through entity relationships , 2012, BMC Bioinformatics.

[251]  Nigel Collier,et al.  Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[252]  Laurel D. Riek,et al.  Callisto: A Configurable Annotation Workbench , 2004, LREC.

[253]  Eduard H. Hovy,et al.  BLANC: Implementing the Rand index for coreference evaluation , 2010, Natural Language Engineering.

[254]  Sophia Ananiadou,et al.  High-Precision Semantic Search by Generating and Testing Questions. , 2010 .

[255]  Hagit Shatkay,et al.  New directions in biomedical text annotation: definitions, guidelines and corpus construction , 2006, BMC Bioinformatics.

[256]  S G P Vellay,et al.  Interactive text mining with Pipeline Pilot: a bibliographic web-based tool for PubMed. , 2009, Infectious disorders drug targets.

[257]  Michael R. Seringhaus,et al.  Seeking a New Biology through Text Mining , 2008, Cell.

[258]  Antje Chang,et al.  New Developments , 2003 .

[259]  Pascale Fung,et al.  A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups , 2004, Machine Translation.

[260]  Halil Kilicoglu,et al.  Biological event composition , 2012, BMC Bioinformatics.

[261]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[262]  José Luís Oliveira,et al.  Egas – Collaborative Biomedical Annotation as a Service ! , 2013 .

[263]  Maria Liakata,et al.  Identifying the Information Structure of Scientific Abstracts: An Investigation of Three Different Schemes , 2010, BioNLP@ACL.

[264]  Ulf Leser,et al.  Evaluation of the CellFinder pipeline in the BioCreative IV User Interactive task , 2013 .

[265]  Éric Gaussier,et al.  Improving Corpus Comparability for Bilingual Lexicon Extraction from Comparable Corpora , 2010, COLING.

[266]  Didier Bourigault,et al.  Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases , 1992, COLING.

[267]  J. Pustejovsky,et al.  Medstract : Creating Large-scale Information Servers for biomedical libraries , 2002 .

[268]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[269]  Wen-Lian Hsu,et al.  T-HOD: a literature-based candidate gene database for hypertension, obesity and diabetes , 2013, Database J. Biol. Databases Curation.

[270]  Angus Roberts,et al.  Building a semantically annotated corpus of clinical texts , 2009, J. Biomed. Informatics.

[271]  Akinori Yonezawa,et al.  The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011 , 2012, BMC Bioinformatics.

[272]  Hongfang Liu,et al.  Pacific Symposium on Biocomputing 9:238-249(2004) BIOLOGICAL NOMENCLATURES: A SOURCE OF LEXICAL KNOWLEDGE AND AMBIGUITY , 2022 .

[273]  Padmini Srinivasan,et al.  The Language of Bioscience: Facts, Speculations, and Statements In Between , 2004, HLT-NAACL 2004.

[274]  Naoaki Okazaki,et al.  Kleio: a knowledge-enriched information retrieval system for biology , 2008, SIGIR '08.

[275]  Sophia Ananiadou,et al.  Text Mining for Biology And Biomedicine , 2005 .

[276]  Sophia Ananiadou,et al.  Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry , 2011, PloS one.

[277]  Sampo Pyysalo,et al.  Overview of the Cancer Genetics (CG) task of BioNLP Shared Task 2013 , 2013, BioNLP@ACL.

[278]  K. Bretonnel Cohen,et al.  Text mining for the biocuration workflow , 2012, Database J. Biol. Databases Curation.

[279]  Robert B. Russell,et al.  SuperTarget and Matador: resources for exploring drug-target relationships , 2007, Nucleic Acids Res..

[280]  Alexander H. Waibel,et al.  Improving Statistical Machine Translation in the Medical Domain using the Unified Medical Language system , 2004, COLING.

[281]  Peter Willett,et al.  Protein Structures and Information Extraction from Biological Texts: The PASTA System , 2003, Bioinform..

[282]  Elena Beisswanger,et al.  The GeneReg Corpus for Gene Expression Regulation Events — An Overview of the Corpus and its In-Domain and Out-of-Domain Interoperability , 2010, LREC.

[283]  Ulf Leser,et al.  GeneView: a comprehensive semantic search engine for PubMed , 2012, Nucleic Acids Res..

[284]  Hong Yu,et al.  BioN∅T: A searchable database of biomedical negated sentences , 2011, BMC Bioinformatics.

[285]  Jun'ichi Tsujii,et al.  Event Extraction with Complex Event Classification Using Rich Features , 2010, J. Bioinform. Comput. Biol..

[286]  Dan Klein,et al.  Mention Detection: Heuristics for the OntoNotes annotations , 2011, CoNLL Shared Task.

[287]  Martin Schierle,et al.  A Survey of Text Mining Architectures and the UIMA Standard , 2012, LREC.

[288]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[289]  Sophia Ananiadou,et al.  Building a Bio-Event Annotated Corpus for the Acquisition of Semantic Frames from Biomedical Corpora , 2008, LREC.

[290]  K. Bretonnel Cohen,et al.  The structural and content aspects of abstracts versus bodies of full text journal articles are different , 2010, BMC Bioinformatics.

[291]  Xiaoyan Wang,et al.  Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[292]  G Demetriou,et al.  Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[293]  Dietrich Rebholz-Schuhmann,et al.  The BioLexicon: a large-scale terminological resource for biomedical text mining , 2011, BMC Bioinformatics.

[294]  T. Takagi,et al.  Toward information extraction: identifying protein names from biological papers. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[295]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[296]  D. Marcu,et al.  Processing Comparable Corpora With Bilingual Suffix Trees , 2002, EMNLP.

[297]  Jun'ichi Tsujii,et al.  Overview of BioNLP 2011 Protein Coreference Shared Task , 2011, BioNLP@ACL.

[298]  Cristina Nicolae,et al.  BESTCUT: A Graph Algorithm for Coreference Resolution , 2006, EMNLP.

[299]  Lynette Hirschman,et al.  A Model-Theoretic Coreference Scoring Scheme , 1995, MUC.

[300]  Jun'ichi Tsujii,et al.  Semantic Retrieval for the Accurate Identification of Relational Concepts in Massive Textbases , 2006, ACL.

[301]  Sophia Ananiadou,et al.  Meta-Knowledge Annotation of Bio-Events , 2010, LREC.

[302]  Sampo Pyysalo,et al.  Overview of BioNLP Shared Task 2013 , 2013, BioNLP@ACL.

[303]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.

[304]  Jun'ichi Tsujii,et al.  Feature Forest Models for Probabilistic HPSG Parsing , 2008, CL.

[305]  Sophia Ananiadou,et al.  Interoperability and Customisation of Annotation Schemata in Argo , 2014, LREC.

[306]  Andreas Vlachos,et al.  Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain , 2006, BioNLP@NAACL-HLT.