论文信息 - Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms by Thorsten Joachims

Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms by Thorsten Joachims

Those trying to make sense of the notion of textual content and semantics within the wild, wild world of information retrieval, categorization, and filtering have to deal often with an overwhelming sea of problems. The really strange story is that most of them (myself included) still believe that developing a linguistically principled approach to text categorization is an interesting research problem. This will also emerge in the discussion of the book that is the focus of this review. Learning to Classify Texts Using Support Vector Machines by Thorsten Joachims proposes a theory for automatic learning of text categorization models that has been repeatedly shown to be very successful. At the same time, the approach proposed is based on a rather rough linguistic generalization of (what apparently is) a language-dependent task: topic text classification (TC). The result is twofold: on the one hand, a learning theory, based on statistical learnability principles and results, that avoids the limitations of the strong empiricism typical of most text classification research; and on the other hand, the application of a naive linguistic model, the bag-of-words representation, to linguistic objects (i.e., the documents) that still achieves impressive accuracy.

Roberto Basili | Roberto Basili

[1] Renata Vieira,et al. An Empirically-based System for Processing Definite Descriptions , 2000, CL.

[2] Federico Girosi,et al. An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[3] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.

[4] Gerard Salton,et al. The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[5] Robert M. Losee. Text retrieval and filtering: analytic models of performance , 1998 .

[6] Roberto Basili,et al. Intelligent NLP-Driven Text Classification , 2002, Int. J. Artif. Intell. Tools.

[7] Lynette Hirschman,et al. A Model-Theoretic Coreference Scoring Scheme , 1995, MUC.

[8] Yuji Matsumoto,et al. Chunking with Support Vector Machines , 2001, NAACL.

[9] Kostas Tzeras,et al. Automatic indexing based on Bayesian inference networks , 1993, SIGIR.

[10] David Carter,et al. Book Reviews: Interpreting Anaphors in Natural Language Texts , 1990, CL.

[11] Hwee Tou Ng,et al. A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[12] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.

[13] Shalom Lappin,et al. An Algorithm for Pronominal Anaphora Resolution , 1994, CL.