An OAI-Based Filtering Service for CITIDEL from NDLTD

One goal of the Computing and Information Technology Interactive Digital Educational Library (CITIDEL) is to maximize the number of computing-related resources available to computer science scholars and practitioners through it. In this paper, we describe a set of experiments designed to help this goal by adding to CITIDEL a sub-collection of computing related electronic theses and dissertations (ETDs) automatically extracted from the Networked Digital Library of Theses and Dissertations (NDLTD) OAI Union Catalog. We analyze the metadata quality of the NDLTD OAI Union Catalog and describe three different experiments that combine different sources of evidence to improve the accuracy in filtering out the computing related entries.

[1]  Edward A. Fox,et al.  Networked Digital Library of Theses and Dissertations: Bridging the Gaps for Global Access - Part 1: Mission and Progress , 2001, D Lib Mag..

[2]  Hock-Liew Eng,et al.  Networked digital library of theses and dissertations , 2005 .

[3]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[4]  Thorsten Joachims,et al.  A Statistical Learning Model of Text Classification for Support Vector Machines. , 2001, SIGIR 2002.

[5]  Thorsten Joachims,et al.  A statistical learning learning model of text classification for support vector machines , 2001, SIGIR '01.

[6]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[7]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[8]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[9]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[10]  Edward A. Fox,et al.  Networked Digital Library of Theses and Dissertations (NDLTD) , 2004 .

[12]  Amy Friedlander,et al.  D-Lib Magazine: Publishing as the Honest Broker , 1998 .

[13]  Edward A. Fox,et al.  The Open Archives Initiative , 2001 .

[14]  Gail McMillan,et al.  Open Archives Initiative , 2000 .

[15]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[16]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[17]  Céline Rouveirol,et al.  Machine Learning: ECML-98 , 1998, Lecture Notes in Computer Science.