Welcome to the HLT-NAACL'06 BioNLP Workshop, Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis.
The late 1990s saw the beginning of a trend towards significant growth in the area of biomedical language processing, and in particular in the use of natural language processing techniques in the molecular biology and related computational bioscience domains. The figure below gives an indication of the amount of recent activity in this area: it shows the cumulative number of documents returned by searching PubMed, the premiere repository of biomedical scientific literature, with the query ((natural language processing) OR (text mining)) AND (gene OR protein), limiting the search by year for every year from 1999 through 2005: the three papers in 1999 had grown to 227 by the end of 2005.
Significant challenges to biological literature exploitation remain, in particular for such biological problem areas as automated function prediction and pathway reconstruction and for linguistic applications such as relation extraction and abstractive summarization. In light of the nature of these remaining challenges, the focus of this workshop was intended to be applications that move towards deeper semantic analysis. We particularly solicited work that addresses relatively under-explored areas such as summarization and question-answering from biological information.
Papers describing applications of semantic processing technologies to the biology domain were especially invited. That is, the primary topics of interest were applications which require deeper linguistic analysis of the biological literature. We also solicited papers exploring issues in porting NLP systems originally constructed for other domains to the biology domain. What makes the biology domain special? What hurdles must be overcome in performing linguistic analysis of biological text? Are any special linguistic or knowledge resources required, beyond a domain-specific lexicon? What relations in biological text are most interesting to biologists, and hence should be the focus of our future efforts?
The workshop received 31 submissions: 29 full-paper submissions, and two poster submissions. A strong program committee, representing BioNLP researchers in North America, Europe, and Asia, provided thorough reviews, resulting in the acceptance of eleven full papers and nineteen posters, for an acceptance rate for full papers of 38% (11/29), which we believe made this one of the most competitive BioNLP workshop or conference sessions to date.
A notable trend in the accepted papers is that only one of them was on the topic of entity identification. The subject areas of the papers presented at BioNLP'06 included an exceptionally wide range of topics: question-answering, computational lexical semantics, information extraction, entity normalization, semantic role labelling, image classification, and syntactic aspects of the sublanguage of molecular biology