Knowledge discovery in digital libraries of electronic theses and dissertations: an NDLTD case study

Many scholarly writings today are available in electronic formats. With universities around the world choosing to make digital versions of their dissertations, theses, project reports, and related files and data sets available online, an overwhelming amount of information is becoming available on almost any particular topic. How will users decide which dissertation, or subsection of a dissertation, to read to get the required information on a particular topic? What kind of services can such digital libraries provide to make knowledge discovery easier? In this paper, we investigate these issues, using as a case study the Networked Digital Library of Theses and Dissertations (NDLTD), a rapidly growing collection that already has about 800,000 Electronic Theses and Dissertations (ETDs) from universities around the world. We propose the design for a scalable, Web Services based tool KDWebS (Knowledge Discovery System based on Web Services), to facilitate automated knowledge discovery in NDLTD. We also provide some preliminary proof of concept results to demonstrate the efficacy of the approach.

[1]  Kun Bai,et al.  TableSeer: automatic table metadata extraction and searching in digital libraries , 2007, JCDL '07.

[2]  Edward A. Fox,et al.  NDLTD: Preparing the next generation of scholars for the information age , 1997 .

[3]  Herbert Van de Sompel,et al.  The open archives initiative: building a low-barrier interoperability framework , 2001, JCDL '01.

[4]  Ah-Hwee Tan,et al.  Knowledge discovery from texts: a concept frame graph approach , 2002, CIKM '02.

[5]  MacKenzie Smith,et al.  DSpace: An Open Source Dynamic Digital Repository , 2003, D Lib Mag..

[6]  Karen Sparck Jones Automatic keyword classification for information retrieval , 1971 .

[7]  Edward A. Fox,et al.  MARIAN Searching and Querying across Heterogeneous Federated Digital Libraries , 2000, DELOS.

[8]  Jerry R. Hobbs,et al.  Pronoun resolution , 1977, SGAR.

[9]  Brian R. Gaines,et al.  Using Knowledge Acquisition and Representation Tools to Support Scientific Communities , 1994, AAAI.

[10]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[11]  R. Felder,et al.  Applications, Reliability and Validity of the Index of Learning Styles* , 2005 .

[12]  M. Callon,et al.  Mapping the Dynamics of Science and Technology , 1986 .

[13]  Alberto J. Cañas,et al.  Facilitating the Adoption of Concept Mapping Using CmapTools to Enhance Meaningful Learning 1 , 2008 .

[14]  J. Mintzes,et al.  The concept map as a research tool: Exploring conceptual change in biology , 1990 .

[15]  Gloria Gomez,et al.  CmapTools: A Knowledge Modeling and Sharing Environment , 2004 .

[16]  Alberto J. Cañas,et al.  Concept Mapping Using CmapTools to Enhance Meaningful Learning , 2008 .

[17]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[18]  Hui Han,et al.  A service-oriented architecture for digital libraries , 2004, ICSOC '04.

[19]  Calvin J. Ribbens,et al.  Balancing Computational Science and Computer Science Research on a Terascale Computing Facility , 2005, International Conference on Computational Science.

[20]  Edward A. Fox,et al.  Using bilingual ETD collections to mine phrase translations , 2007, JCDL '07.

[21]  Jerry R. Hobbs Resolving pronoun references , 1986 .

[22]  Graham Wilcock,et al.  Unstructured Information Management Architecture (UIMA) , 2009 .

[23]  Hsinchun Chen,et al.  Using sentence-selection heuristics to rank text segments in TXTRACTOR , 2002, JCDL '02.

[24]  J. Mintzes,et al.  Assessing science understanding : a human constructivist view , 2005 .

[25]  Joseph D. Novak,et al.  Learning How to Learn , 1984 .

[26]  John A. Barnden,et al.  Semantic Networks , 1998, Encyclopedia of Social Network Analysis and Mining.

[27]  Malgorzata S. Zywno,et al.  The Effect Of Hypermedia Instruction On Achievement And Attitudes Of Students With Different Learning Styles , 2001 .

[28]  Stuart C. Shapiro,et al.  Encyclopedia of artificial intelligence, vols. 1 and 2 (2nd ed.) , 1992 .

[30]  Massih-Reza Amini,et al.  The use of unlabeled data to improve supervised learning for text summarization , 2002, SIGIR '02.

[31]  Kalina Bontcheva,et al.  Evolving GATE to meet new challenges in language engineering , 2004, Natural Language Engineering.

[32]  Tomek Strzalkowski,et al.  A Robust Practical Text Summarization , 1998 .

[33]  Edward A. Fox,et al.  Using Concept Maps in NDLTD as a Cross-Language Summarization Tool for Computing–Related ETDs , 2007 .

[34]  J. Malone,et al.  The Concept Map as an Aid to Instruction in Science and Mathematics , 1984 .