Semi-supervised learning for classification on Chinese drug treatment questions

Construction of automatic Question Answering (QA) system for online healthcare community gains great attention from the researchers due to the lack of practitioners who can respond precise answers to patients' consultations. In this paper, we focus on drug treatment question classification task, which can help building QA system for such questions. Due to the lack of labeled data and the high cost of labeling work, we consider using documents or texts from the internet which are related to our task. We fetch a large amount of unlabeled question-answer pairs and drug instructions from the internet and design a co-training style method (which is a semi-supervised learning method) to utilize them. Using some specified models as classifiers, we prove that the classifier trained under our method outperforms the same model trained under supervised learning with less labeled data as input.

[1]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[2]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[3]  Jin Liu,et al.  Attention-based BiGRU-CNN for Chinese question classification , 2019, Journal of Ambient Intelligence and Humanized Computing.

[4]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[5]  Jane Hunter,et al.  Load Balancing for Imbalanced Data Sets: Classifying Scientific Artefacts for Evidence Based Medicine , 2014, PRICAI.

[6]  Te-Ming Chang,et al.  Modeling Health Care Q&A Questions with Ensemble Classification Approaches , 2016, AMCIS.

[7]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[8]  Xuan-Hieu Phan,et al.  Using Dependency Analysis to Improve Question Classification , 2014, KSE.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  N. Omar,et al.  A rule-based approach in Bloom's Taxonomy question classification through natural language processing , 2012, 2012 7th International Conference on Computing and Convergence Technology (ICCCT).

[11]  Anthony N. Nguyen,et al.  Clinical Document Classification Using Labeled and Unlabeled Data Across Hospitals , 2018, AMIA.

[12]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[13]  Jing Li,et al.  Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings , 2018, NAACL.