Infrequent pattern mining in smart healthcare environment using data summarization

A summarization technique creates a concise version of large amount of data (big data!) which reduces the computational cost of analysis and decision-making. There are interesting data patterns, such as rare anomalies, which are more infrequent in nature than other data instances. For example, in smart healthcare environment, the proportion of infrequent patterns is very low in the underlying cyber physical system (CPS). Existing summarization techniques overlook the issue of representing such interesting infrequent patterns in a summary. In this paper, a novel clustering-based technique is proposed which uses an information theoretic measure to identify the infrequent frequent patterns for inclusion in a summary. The experiments conducted on seven benchmark CPS datasets show substantially good results in terms of including the infrequent patterns in summaries than existing techniques.

[1]  Zhiyuan Tan,et al.  Security for Cyber-Physical Systems in Healthcare , 2017 .

[2]  Mohiuddin Ahmed,et al.  Network Traffic Pattern Analysis Using Improved Information Theoretic Co-clustering Based Collective Anomaly Detection , 2014, SecureComm.

[3]  Mohiuddin Ahmed,et al.  Network traffic analysis based on collective anomaly detection , 2014, 2014 9th IEEE Conference on Industrial Electronics and Applications.

[4]  Michael J. Maher,et al.  An Investigation of Performance Analysis of Anomaly Detection Techniques for Big Data in SCADA Systems , 2015, EAI Endorsed Trans. Ind. Networks Intell. Syst..

[5]  Meikang Qiu,et al.  Health-CPS: Healthcare Cyber-Physical System Assisted by Cloud and Big Data , 2017, IEEE Systems Journal.

[6]  Patrick Wendel pjw Scalable clustering on the data grid , 2004 .

[7]  Marimuthu Palaniswami,et al.  A Hybrid Approach to Clustering in Big Data , 2016, IEEE Transactions on Cybernetics.

[8]  Ayman Ibaida,et al.  BDCaM: Big Data for Context-Aware Monitoring—A Personalized Knowledge Discovery Framework for Assisted Healthcare , 2017, IEEE Transactions on Cloud Computing.

[9]  Michael J. Maher,et al.  A Novel Approach for Network Traffic Summarization , 2014, Infoscale.

[10]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[11]  Jiawei Han,et al.  DBLearn: a system prototype for knowledge discovery in relational databases , 1994, SIGMOD '94.

[12]  Jiawei Han,et al.  Knowledge Discovery in Databases: An Attribute-Oriented Approach , 1992, VLDB.

[13]  Mohiuddin Ahmed,et al.  Reservoir-based network traffic stream summarization for anomaly detection , 2018, Pattern Analysis and Applications.

[14]  Zahir Tari,et al.  Data summarization for network traffic monitoring , 2014, J. Netw. Comput. Appl..

[15]  Kyung-Sup Kwak,et al.  The Internet of Things for Health Care: A Comprehensive Survey , 2015, IEEE Access.

[16]  Lawrence O. Hall,et al.  Scalable clustering: a distributed approach , 2004, 2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542).

[17]  Mohiuddin Ahmed,et al.  Infrequent Pattern Identification in SCADA Systems Using Unsupervised Learning , 2017 .

[18]  Mohiuddin Ahmed,et al.  Collective Anomaly Detection Techniques for Network Traffic Analysis , 2018 .

[19]  Vipin Kumar,et al.  Summarization - compressing data into an informative representation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[20]  Jiawei Han,et al.  DBMiner: A System for Mining Knowledge in Large Relational Databases , 1996, KDD.

[21]  Mohiuddin Ahmed,et al.  An Unsupervised Approach of Knowledge Discovery from Big Data in Social Network , 2017, EAI Endorsed Trans. Scalable Inf. Syst..

[22]  Mohiuddin Ahmed,et al.  Anomaly Detection on Big Data in Financial Markets , 2017, 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[23]  Syed Mahfuzul Aziz,et al.  Review of Cyber-Physical System in Healthcare , 2014, Int. J. Distributed Sens. Networks.

[24]  Mohiuddin Ahmed,et al.  Novel Approach for Network Traffic Pattern Analysis using Clustering-based Collective Anomaly Detection , 2015, Annals of Data Science.

[25]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[26]  Mohiuddin Ahmed Data summarization: a survey , 2018, Knowledge and Information Systems.

[27]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[28]  Mohiuddin Ahmed,et al.  A survey of network anomaly detection techniques , 2016, J. Netw. Comput. Appl..

[29]  Rebecca Castano,et al.  Semi-Supervised Data Summarization: Using Spectral Libraries to Improve Hyperspectral Clustering , 2005 .

[30]  Mohiuddin Ahmed,et al.  Thwarting DoS Attacks: A Framework for Detection based on Collective Anomalies and Clustering , 2017, Computer.

[31]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[32]  Ronald R. Yager,et al.  A new approach to the summarization of data , 1982, Inf. Sci..

[33]  Marimuthu Palaniswami,et al.  Labelled data collection for anomaly detection in wireless sensor networks , 2010, 2010 Sixth International Conference on Intelligent Sensors, Sensor Networks and Information Processing.

[34]  Michael J. Maher,et al.  An Efficient Technique for Network Traffic Summarization using Multiview Clustering and Statistical Sampling , 2015, EAI Endorsed Trans. Scalable Inf. Syst..

[35]  Jiawei Han,et al.  Attribute-Oriented Induction in Relational Databases , 1991, Knowledge Discovery in Databases.

[36]  Zahir Tari,et al.  SCADAVT-A framework for SCADA security testbed based on virtualization technology , 2013, 38th Annual IEEE Conference on Local Computer Networks.

[37]  Mohiuddin Ahmed,et al.  Clustering based semantic data summarization technique: A new approach , 2014, 2014 9th IEEE Conference on Industrial Electronics and Applications.

[38]  Mohiuddin Ahmed,et al.  A novel approach for outlier detection and clustering improvement , 2013, 2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA).

[39]  Padmini Srinivasan,et al.  A quality-threshold data summarization algorithm , 2008, 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies.

[40]  Md. Rafiqul Islam,et al.  A survey of anomaly detection techniques in financial domain , 2016, Future Gener. Comput. Syst..

[41]  Diego R. Lopez,et al.  Summarization and Analysis of Network Traffic Flow Records , 2011 .