Automatic Syllabus Classification Using Support Vector Machines

Syllabi are important educational resources. Gathering syllabi that are freely available and creating useful services on top of the collection presents great value for the educational community. However, searching for a syllabus on the Web using a generic search engine is an error-prone process and often yields too many irrelevant links. In this chapter, we describe our empirical study on automatic syllabus classification using Support Vector Machines (SVM) to filter noise out from search results. We describe various steps in the classification process from training data preparation, feature selection, and classifier building using SVMs. Empirical results are provided and discussed. We hope our reported work will also benefit people who are interested in building other genre-specific repositories.

[1]  Alistair Kennedy,et al.  Automatic Identification of Home Pages on the Web , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[2]  Edward A. Fox,et al.  The Core: Digital Library Education in Library and Information Science Programs , 2006, D Lib Mag..

[3]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[4]  Edward A. Fox,et al.  Automatic syllabus classification , 2007, JCDL '07.

[5]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[6]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[7]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[8]  Sachio Hirokawa,et al.  A WEB SYLLABUS CRAWLER AND ITS EFFICIENCY EVALUATION , 2003 .

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  Edward A. Fox,et al.  Towards a syllabus repository for computer science courses , 2007, SIGCSE.

[11]  Fernando Adrian Das Neves,et al.  Stepping Stones and Pathways:Improving Retrieval by Chains of Relationships between Documents , 2004 .

[12]  Hae-Chang Rim,et al.  Some Effective Techniques for Naive Bayes Text Classification , 2006, IEEE Transactions on Knowledge and Data Engineering.

[13]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[14]  Johannes Fürnkranz,et al.  Round Robin Rule Learning , 2001, ICML.