论文信息 - Section Identification to Improve Information Extraction from Chinese Medical Literature

Section Identification to Improve Information Extraction from Chinese Medical Literature

The Chinese medical literature contains a large amount of knowledge. Reducing the effort needed by medical scholars to extract this knowledge requires a literature analysis to identify the key information in each paper. We argue that identifying the sections of a paper would help us filter noise from the paper and increase the accuracy of extracting the experimental findings. In this research in progress, we consider paper section identification as a sentence classification task and apply Conditional Random Fields (CRFs) to tackle the problem. In our model we combine both lexical and structural features to facilitate section identification. Experiments on a human-curated asthma dataset show that our approach achieves a 10%–20% performance improvement over Support Vector Machines (SVMs), and that use of both bag-of-words features and domain lexicons benefit the task.

Xin Li | Sijia Zhou

[1] Stephen Cranefield,et al. Context identification of sentences in related work sections using a conditional random field: towards intelligent digital libraries , 2010, JCDL '10.

[2] Andrew McCallum,et al. An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[3] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[4] Marco Lui. Feature Stacking for Sentence Classification in Evidence-Based Medicine , 2012, ALTA.

[5] Padmini Srinivasan,et al. Categorization of Sentence Types in Medical Abstracts , 2003, AMIA.

[6] Masashi Shimbo,et al. Semi-supervised sentence classification for MEDLINE documents , 2004 .

[7] Jun Zhao,et al. Adding Redundant Features for CRFs-based Sentence Sentiment Classification , 2008, EMNLP.

[8] David Martinez,et al. Automatic classification of sentences for evidence based medicine , 2010, DTMBIO '10.

[9] Xin Li,et al. MedC: A Literature Analysis System for Chinese Medicine Research , 2015, ICSH.

[10] Joe Carthy,et al. Sentence-level event classification in unstructured texts , 2009, Information Retrieval.

[11] Rebecca Smith,et al. Automated ventricular systems segmentation in brain CT images by combining low-level segmentation and high-level template matching , 2009, BMC Medical Informatics Decis. Mak..

[12] Claire Grover,et al. Sequence modelling for sentence classification in a legal summarisation system , 2005, SAC '05.

[13] Grace Yuet-Chee Chung,et al. Sentence retrieval for abstracts of randomized controlled trials , 2009, BMC Medical Informatics Decis. Mak..

[14] Yasunori Yamamoto,et al. A Sentence Classification System for Multi Biomedical Literature Summarization , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[15] Jimmy J. Lin,et al. Answering Clinical Questions with Knowledge-Based and Statistical Techniques , 2007, CL.