Developing an automatic metadata harvesting and generation system for a continuing education repository: a pilot study

The goal of this pilot study is to assess the effectiveness and reliability of an automated metadata generation and harvesting system developed for a project repository which hosts continuing education resources for cataloging and metadata professionals. Using a web crawler developed for the repository, 500 web resources are selected as seed pages for metadata extraction and generation. This paper summarizes the processes as well as the results of the study. The metadata harvesting system combined with powerful article analysis and data generation tools such as Adlegant’s Article Anaylsis API produces significant improvement in metadata generation.