论文信息 - Classification techniques with minimal labelling effort and application to medical reports.

Classification techniques with minimal labelling effort and application to medical reports.

There are a number of approaches to classify text documents. Here, we use Partially Supervised Classification (PSC) and argue that it is an effective and efficient approach for real-world problems. PSC uses a two-step strategy to cut down on the labelling effort. There are a number of methods that have been proposed for each step. An evaluation of various methods is conducted using real-world medical documents. The results show that using EM to build the classifier yields better results than SVM. We also experimentally show that careful selection of a subset of features to represent the documents can improve performance.

Beatriz de la Iglesia | Fathi H Saad | G Duncan Bell

[1] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.

[2] M. F. Porter,et al. An algorithm for suffix stripping , 1997 .

[3] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[4] Xiaoli Li,et al. Learning to Classify Texts Using Positive and Unlabeled Data , 2003, IJCAI.

[5] François Denis. PAC Learning from Positive Statistical Queries , 1998, ALT.

[6] Thorsten Joachims,et al. Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[7] Max Bramer,et al. Neighbourhood Exploitation in Hypertext Categorization , 2004, SGAI Conf..

[8] Philip S. Yu,et al. Partially Supervised Classification of Text Documents , 2002, ICML.

[9] Sebastian Thrun,et al. Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[10] Adam Kowalczyk,et al. Combining clustering and co-training to enhance text classification using unlabelled data , 2002, KDD.

[11] David D. Lewis,et al. Evaluating Text Categorization I , 1991, HLT.