Katsir: A Framework for Harvesting Digital Libraries on the Web

The information era has brought with it the wellknown problem of 'Information Explosion'. There are many and varied search engines on the Internet but it is still hard to locate and concentrate only on materials relevant to a specific task. Digital libraries, on the other hand, provide better services for focused discovery of relevant Web resources. However, digital libraries have been much less researched and implemented than search engines. The 'Katsir/Harvest' project laid the ground for our understanding that a new paradigm should to be developed the Harvested Digital Library (HDL). The contribution of this article is in presenting a new framework and harvesting model for constructing HDLs. The open harvesting architecture proposed here uses advanced information retrieval tools and provides a set of integrated DL services to its users. This model and architecture are discussed throughout the article, including description of the implemented Katsir system and discussion of future research directions. The future DLs will be knowledge rich in the sense that each DL contains relevant meta-information on its domain and employs advanced knowledge management techniques.

[1]  Gerald Kowalski,et al.  Information Retrieval Systems: Theory and Implementation , 1997 .

[2]  David Clark Natural Language, Relevancy Ranking, and Common Sense , 1999 .

[3]  Peter B. Danzig,et al.  Scalable Internet resource discovery: research problems and approaches , 1994, CACM.

[4]  Jack Kessler Internet Digital Libraries: The International Dimension , 1996 .

[5]  Godfrey Rust,et al.  Metadata: The Right Approach, An Integrated Model for Descriptive and Rights Metadata in E-commerce , 1998, D Lib Mag..

[6]  Peter B. Danzig,et al.  Harvest: A Scalable, Customizable Discovery and Access System , 1994 .

[7]  Ariel J. Frank,et al.  Intelligent Information Harvesting Architecture: An Application to a High School Environment. , 1996 .

[8]  Ora Lassila,et al.  WEB METADATA : A Matter of Semantics , 1998 .

[9]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[10]  Michael E. Lesk,et al.  Practical Digital Libraries: Books, Bytes, and Bucks , 1997 .

[11]  Daniel E. O'Leary,et al.  Enterprise Knowledge Management , 1998, Computer.

[12]  Loren G. Terveen,et al.  Constructing, organizing, and visualizing collections of topically related Web resources , 1999, TCHI.

[13]  Josiane Mothe,et al.  TetraFusion: information discovery on the Internet , 1999, IEEE Intell. Syst..

[14]  Hsinchun Chen,et al.  Digital Libraries: Social issues and technological advances , 1999, Adv. Comput..

[15]  Richard D. Hackathorn,et al.  Web Farming for the Data Warehouse , 1998 .

[16]  Candy Schwartz,et al.  Web Search Engines , 1998, J. Am. Soc. Inf. Sci..

[17]  Peretz Shoval,et al.  Stereotypes in Information Filtering Systems , 1997, Inf. Process. Manag..

[18]  C. Lee Giles,et al.  Accessibility of information on the web , 1999, Nature.

[19]  Shmuel Tomi Klein,et al.  Information Retrieval from Annotated Texts , 1999, J. Am. Soc. Inf. Sci..