A Method for Discovering and Obtaining Company Hot Events from Internet News

With the rapid development and popularization, Internet is becoming the most convenient way to publish and obtain information, which causes an extremely increasing quantity and variety of data. It is difficult to find out potentially valuable information from these data, which is the primary problem of data mining. Mining company hot events from Internet news can effectively reflect how its business works. Thus, we propose a method for discovering and obtaining hot events from Internet news. In the proposed method, we use Gaussian kernel to update clustering center instead of global cluster to modify Single-Pass clustering algorithm. It is a dynamic incremental clustering algorithm which does not need to initialize the number of clusters. Then, Top-N hot events can be obtained through the clustering centers. Experimental comparison shows that the improved algorithm has higher clustering efficiency than the classic algorithm. Case studies from Shanghai pilot free-trade zone (FTZ) also show the effectiveness of our proposed method.

[1]  Juan M. Fernández-Luna,et al.  Top-N news recommendations in digital newspapers , 2012, Knowl. Based Syst..

[2]  Grigori Sidorov,et al.  Soft Similarity and Soft Cosine Measure: Similarity of Features in Vector Space Model , 2014, Computación y Sistemas.

[3]  Hendri Murfi,et al.  Combination of singular value decomposition and K-means clustering methods for topic detection on Twitter , 2015, 2015 International Conference on Advanced Computer Science and Information Systems (ICACSIS).

[4]  Li Fang,et al.  Single-Pass Clustering Algorithm Based on Storm , 2017 .

[5]  Nan Zhang,et al.  The research on micro-blog public opinion index and the application of prototype system , 2012, Proceedings of 2012 9th IEEE International Conference on Networking, Sensing and Control.

[6]  A. Rama Mohan Reddy,et al.  A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method , 2016, Pattern Recognit..

[7]  Ke Wang,et al.  TopicSketch: Real-Time Bursty Topic Detection from Twitter , 2013, 2013 IEEE 13th International Conference on Data Mining.

[8]  Hao Zhang,et al.  Turning from TF-IDF to TF-IGM for term weighting in text classification , 2016, Expert Syst. Appl..

[9]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[10]  Li Zhou,et al.  An Approach to News Event Detection and Tracking Based on Stream of Online News , 2017, 2017 9th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC).

[11]  Haitao Xiong,et al.  A hybrid model of VSM and LDA for text clusteing , 2017, 2017 2nd IEEE International Conference on Computational Intelligence and Applications (ICCIA).

[12]  Yaying Zhang,et al.  A Novel Method for Open Relation Extraction from Public Announcements of Chinese Listed Companies , 2017, 2017 Fifth International Conference on Advanced Cloud and Big Data (CBD).

[13]  James Allan,et al.  Introduction to topic detection and tracking , 2002 .

[14]  José Luis Martínez-Fernández,et al.  Automatic Keyword Extraction for News Finder , 2003, Adaptive Multimedia Retrieval.

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Zhengtao Yu,et al.  Micro-blog topic detection method based on BTM topic model and K-means clustering algorithm , 2016, Automatic Control and Computer Sciences.

[17]  X. J. Li,et al.  A Topic-based Dynamic Clustering Algorithm for Text Stream , 2015 .

[18]  Alexander J. Smola,et al.  Online Inference for the Infinite Topic-Cluster Model: Storylines from Streaming Text , 2011, AISTATS.