Representative News Generation using Automatic Clustering in Big Data Environment

There are 43,000 online media in Indonesia which publish at least one until two news every hour. The amount of information exceeds human processing capacity, resulting in several impacts on humans such as confusion and psychological stress. In this research we propose a new system for processing incremental news data and provide a mechanism for determining representative news by applying Automatic Clustering algorithm. The system consists of 4 main functions: (1) Data Acquisition and Preprocessing, (2) Keyword Feature Extraction, (3) Data Aggregation, Automatic Clustering, and (4) Incremental Clustering. The news is grouped in to same information based on information-retrieval. This system runs on big data environment to process large amount of data. There are 3,000 news collected in database by the system in a whole day in database. The collected news are processed using Automatic Clustering and then aotumatically grouped into 389 clusters. A cluster is identified as the unknown cluster and the clusters are evaluated without enclosing single member clusters. For experimental study, the system performed 93,51%.