Using Automatic Metadata Extraction to Build a Structured Syllabus Repository

Syllabi are important documents created by instructors for students. Gathering syllabi that are freely available, and creating useful services on top of the collection, will yield a digital library of value for the educational community. However, gathering and building a repository of syllabi is complicated by the unstructured nature of syllabus representation and the lack of a unified vocabulary for syllabus construction. In this paper, we propose an intelligent approach to automatically annotate freely-available syllabi from the Web to benefit the educational community through supporting services such as semantic search. We discuss our detailed process for converting unstructured syllabi to structured representations through entity recognition, segmentation, and association. Our evaluation results demonstrate the effiectiveness of our extractor and also suggest improvements. We hope our work will benefit not only users of our services but also people who are interested in building other genre-specific repositories.

[1]  Fernando Adrian Das Neves,et al.  Stepping Stones and Pathways:Improving Retrieval by Chains of Relationships between Documents , 2004 .

[2]  V. Petridis,et al.  Text Segmentation by Product Partition Models and Dynamic Programming , 2003 .

[3]  Edward A. Fox,et al.  Automatic document metadata extraction using support vector machines , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[4]  Peter Dolog,et al.  Reasoning and Ontologies for Personalized E-Learning in the Semantic Web , 2004, J. Educ. Technol. Soc..

[5]  Valentin Tablan,et al.  Web-assisted annotation, semantic indexing and search of television and radio news , 2005, WWW '05.

[6]  Edward A. Fox,et al.  The Core: Digital Library Education in Library and Information Science Programs , 2006, D Lib Mag..

[7]  Edward A. Fox,et al.  Towards a Standardized Representation of Syllabi to Facilitate Sharing and Personalization of Digital Library Content , 2006 .

[8]  Atsuhiro Takasu,et al.  Bibliographic attribute extraction from erroneous references based on a statistical model , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[9]  Edward A. Fox,et al.  Towards a syllabus repository for computer science courses , 2007, SIGCSE.

[10]  Edward A. Fox,et al.  Automatic syllabus classification , 2007, JCDL '07.

[11]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[12]  Christopher D. Manning,et al.  Finding Educational Resources on the Web: Exploiting Automatic Extraction of Metadata , 2003 .