There are 43,000 online media in Indonesia which publish at least one until two news every hour. The amount of information exceeds human processing capacity, resulting in several impacts on humans such as confusion and psychological stress. In this research we propose a new system for processing incremental news data and provide a mechanism for determining representative news by applying Automatic Clustering algorithm. The system consists of 4 main functions: (1) Data Acquisition and Preprocessing, (2) Keyword Feature Extraction, (3) Data Aggregation, Automatic Clustering, and (4) Incremental Clustering. The news is grouped in to same information based on information-retrieval. This system runs on big data environment to process large amount of data. There are 3,000 news collected in database by the system in a whole day in database. The collected news are processed using Automatic Clustering and then aotumatically grouped into 389 clusters. A cluster is identified as the unknown cluster and the clusters are evaluated without enclosing single member clusters. For experimental study, the system performed 93,51%.
[1]
Frank M. Schneider,et al.
Too much information? Predictors of information overload in the context of online news exposure
,
2017
.
[2]
Ali Ridho Barakbah,et al.
Automatic Representative News Generation using On-Line Clustering
,
2013
.
[3]
Ali Ridho Barakbah,et al.
Cluster-Based News Representative Generation with Automatic Incremental Clustering
,
2019
.
[4]
A. S. M. Romli.
Jurnalistik Online : Panduan mengelola media online
,
2018
.
[5]
Bettina Berendt,et al.
Peddling or Creating? Investigating the Role of Twitter in News Reporting
,
2011,
ECIR.
[6]
Ali Ridho Barakbah,et al.
Automatic Representative News Generation using Automatic Clustering
,
2012
.
[7]
Ali Ridho Barakbah,et al.
Identifying moving variance to make automatic clustering for normal data set
,
2004
.