An Unsupervised Approach of Knowledge Discovery from Big Data in Social Network

Social network is a common source of big data. It is becoming increasingly difficult to understand what is happening in the network due to the volume. To gain meaningful information or identifying the underlying patterns from social networks, summarization is an useful approach to enhance understanding of the pattern from big data. However, existing clustering and frequent item-set based summarization techniques lack the ability to produce meaningful summary and fails to represent the underlying data pattern. In this paper, the effectiveness co-clustering is explored to create meaningful summary of social network data such as Twitter. Experimental results show that, using co-clustering for creating summary provides significant benefit over the existing techniques. Received on 13 March 2017; accepted on 25 July 2017; published on 25 September 2017

[1]  Michael J. Maher,et al.  An Investigation of Performance Analysis of Anomaly Detection Techniques for Big Data in SCADA Systems , 2015, EAI Endorsed Trans. Ind. Networks Intell. Syst..

[2]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[3]  Inderjit S. Dhillon,et al.  A generalized maximum entropy approach to bregman co-clustering and matrix approximation , 2004, J. Mach. Learn. Res..

[4]  Gérard Govaert,et al.  Clustering with block mixture models , 2003, Pattern Recognit..

[5]  G. Govaert,et al.  Latent Block Model for Contingency Table , 2010 .

[6]  Vipin Kumar,et al.  Summarization - compressing data into an informative representation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[7]  Gérard Govaert,et al.  blockcluster: An R Package for Model Based Co-Clustering , 2017 .

[8]  Mohiuddin Ahmed,et al.  Novel Approach for Network Traffic Pattern Analysis using Clustering-based Collective Anomaly Detection , 2015, Annals of Data Science.

[9]  Michael J. Maher,et al.  An Efficient Technique for Network Traffic Summarization using Multiview Clustering and Statistical Sampling , 2015, EAI Endorsed Trans. Scalable Inf. Syst..

[10]  Rebecca Castano,et al.  Semi-Supervised Data Summarization: Using Spectral Libraries to Improve Hyperspectral Clustering , 2005 .

[11]  Gérard Govaert,et al.  Block clustering with Bernoulli mixture models: Comparison of different approaches , 2008, Comput. Stat. Data Anal..

[12]  Md. Rafiqul Islam,et al.  A survey of anomaly detection techniques in financial domain , 2016, Future Gener. Comput. Syst..

[13]  Michael J. Maher,et al.  A Novel Approach for Network Traffic Summarization , 2014, Infoscale.

[14]  Lawrence O. Hall,et al.  Scalable clustering: a distributed approach , 2004, 2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542).

[15]  Mohiuddin Ahmed,et al.  Clustering based semantic data summarization technique: A new approach , 2014, 2014 9th IEEE Conference on Industrial Electronics and Applications.

[16]  Michael J. Maher,et al.  Heart Disease Diagnosis Using Co-clustering , 2014, Infoscale.

[17]  Padmini Srinivasan,et al.  A quality-threshold data summarization algorithm , 2008, 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies.

[18]  Patrick Wendel pjw Scalable clustering on the data grid , 2004 .

[19]  Mohiuddin Ahmed,et al.  A survey of network anomaly detection techniques , 2016, J. Netw. Comput. Appl..

[20]  Zahir Tari,et al.  Data summarization for network traffic monitoring , 2014, J. Netw. Comput. Appl..

[21]  Mohiuddin Ahmed,et al.  Network Traffic Pattern Analysis Using Improved Information Theoretic Co-clustering Based Collective Anomaly Detection , 2014, SecureComm.

[22]  Mohiuddin Ahmed,et al.  Network traffic analysis based on collective anomaly detection , 2014, 2014 9th IEEE Conference on Industrial Electronics and Applications.