论文信息 - Fact or Fiction: Content Classification for Digital Libraries

Fact or Fiction: Content Classification for Digital Libraries

The World-Wide Web (WWW) is a vast repository of information, much of which is valuable but very often hidden to the user. The anarchic nature of the WWW presents unique challenges when it comes to information extraction and categorization. We view the WWW as a valuable resource for the gathering of information for Digital Libraries. In this paper we will describe the process of extracting and classifying information from the WWW for the purpose of integrating it into digital libraries. Our e orts focus on ways to automatically classify news articles according to whether they present opinions or reported facts. We describe and evaluate a system in development that automatically classi es and recommends Web news articles from sports and politics domains.

[1] Aiko M. Hormann,et al. Programs for Machine Learning. Part I , 1962, Inf. Control..

[2] J. Pierre. On the Automated Classi cation of Web Sites , 2022 .