The Concept of Detecting and Classifying Anomalies in Large Data Sets on a Basis of Information Granules

Anomaly (outlier) detection is one of the most important problems of modern data analysis. Anomalies can be the results of database users' mistakes, operational errors or just missing values. The problem is important because of fast growth of the large data sets. Therefore, we present the initial results of work on a Granular Computing approach to data imputation and missing data analysis. Our proposal brings intuitive and interpretable solutions. Finally, in a series of experiments, we demonstrate its effectiveness for a large dataset in the area of transport.

[1]  Peerapon Vateekul,et al.  Fault detection for circulating water pump using time series forecasting and outlier detection , 2017, 2017 9th International Conference on Knowledge and Smart Technology (KST).

[2]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[3]  Thomas Fischer,et al.  Deep learning with long short-term memory networks for financial market predictions , 2017, Eur. J. Oper. Res..

[4]  Zirije Hasani,et al.  Robust anomaly detection algorithms for real-time big data: Comparison of algorithms , 2017, 2017 6th Mediterranean Conference on Embedded Computing (MECO).

[5]  Sankar K. Pal,et al.  Rough Sets, Kernel Set, and Spatiotemporal Outlier Detection , 2014, IEEE Transactions on Knowledge and Data Engineering.

[6]  Hamido Fujita,et al.  Efficient Robust Model Fitting for Multistructure Data Using Global Greedy Search , 2020, IEEE Transactions on Cybernetics.

[7]  Pawel Karczmarek,et al.  Fuzzy Approach for Detection of Anomalies in Time Series , 2019, ICAISC.

[8]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[9]  Feng Jiang,et al.  Outlier detection based on granular computing and rough set theory , 2014, Applied Intelligence.

[10]  Huaglory Tianfield,et al.  A Big Data Analytics Based Approach to Anomaly Detection , 2016, 2016 IEEE/ACM 3rd International Conference on Big Data Computing Applications and Technologies (BDCAT).

[11]  Rasim M. Alguliyev,et al.  Anomaly Detection in Big Data based on Clustering , 2017 .

[12]  Witold Pedrycz,et al.  Multivariate time series anomaly detection: A framework of Hidden Markov Models , 2017, Appl. Soft Comput..

[13]  Angelo Gaeta,et al.  Resilience Analysis of Critical Infrastructures: A Cognitive Approach Based on Granular Computing , 2019, IEEE Transactions on Cybernetics.

[14]  Duoqian Miao,et al.  Outlier Detection Based on Granular Computing , 2008, RSCTC.

[15]  Yangyong Zhu,et al.  The Challenges of Data Quality and Data Quality Assessment in the Big Data Era , 2015, Data Sci. J..

[16]  Daniel B. Work,et al.  New York City Taxi Trip Data (2010-2013) , 2016 .

[17]  W. Pedrycz,et al.  Granular computing and intelligent systems : design with information granules of higher order and higher type , 2011 .

[18]  Richard J. Povinelli,et al.  Probabilistic anomaly detection in natural gas time series data , 2016 .

[19]  Jinoh Kim,et al.  A survey of deep learning-based network anomaly detection , 2017, Cluster Computing.

[20]  G. Box,et al.  Bayesian analysis of some outlier problems in time series , 1979 .

[21]  I. Song,et al.  Analytics over large-scale multidimensional data: the big data revolution! , 2011, DOLAP '11.

[22]  Dan Wang,et al.  Granular data imputation: A framework of Granular Computing , 2016, Appl. Soft Comput..

[23]  M. Tkacz,et al.  Comparison of outlier detection methods in biomedical data , 2010 .

[24]  Witold Pedrycz,et al.  Granular Models and Granular Outliers , 2018, IEEE Transactions on Fuzzy Systems.

[25]  Mohiuddin Ahmed,et al.  Anomaly Detection on Big Data in Financial Markets , 2017, 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[26]  Ejaz Ahmed,et al.  Real-time big data processing for anomaly detection: A Survey , 2019, Int. J. Inf. Manag..

[27]  Hamido Fujita,et al.  Robust Model Fitting Based on Greedy Search and Specified Inlier Threshold , 2019, IEEE Transactions on Industrial Electronics.

[28]  Yiyu Yao,et al.  Granular Computing , 2008 .

[29]  Andrzej Skowron,et al.  Data science, big data and granular mining , 2015, Pattern Recognit. Lett..

[30]  David Meyre,et al.  From big data analysis to personalized medicine for all: challenges and opportunities , 2015, BMC Medical Genomics.

[31]  Antonio Martínez-Álvarez,et al.  Feature selection by multi-objective optimisation: Application to network anomaly detection by hierarchical self-organising maps , 2014, Knowl. Based Syst..

[32]  Mourad Khayati,et al.  2015 Ieee International Conference on Big Data (big Data) Online Anomaly Detection over Big Data Streams , 2022 .

[33]  Witold Pedrycz,et al.  Knowledge-based clustering - from data to information granules , 2007 .

[34]  Pedro Casas,et al.  Network security and anomaly detection with Big-DAMA, a big data analytics framework , 2017, 2017 IEEE 6th International Conference on Cloud Networking (CloudNet).

[35]  Christian S. Jensen,et al.  Outlier Detection for Multidimensional Time Series Using Deep Neural Networks , 2018, 2018 19th IEEE International Conference on Mobile Data Management (MDM).

[36]  Anna Bartkowiak,et al.  Outliers in biometrical data: What's old, What's new , 2010, Int. J. Biom..

[37]  Andrzej Bargiela,et al.  Human-Centric Information Processing Through Granular Modelling , 2009, Human-Centric Information Processing Through Granular Modelling.

[38]  Ali Selamat,et al.  Outlier elimination using granular box regression , 2016, Inf. Fusion.

[39]  Witold Pedrycz,et al.  Anomaly Detection and Characterization in Spatial Time Series Data: A Cluster-Centric Approach , 2014, IEEE Transactions on Fuzzy Systems.

[40]  Xiaoyong Du,et al.  Big data challenge: a data management perspective , 2013, Frontiers of Computer Science.

[41]  Osmar R. Zaïane,et al.  Time series contextual anomaly detection for detecting market manipulation in stock market , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[42]  Piotr S. Szczepaniak,et al.  Detection of Outlier Information Using Linguistic Summarization , 2015, FQAS.

[43]  Philip S. Yu,et al.  Time Series Data Cleaning: From Anomaly Detection to Anomaly Repairing , 2017, Proc. VLDB Endow..

[44]  Witold Pedrycz,et al.  Anomaly detection in time series data using a fuzzy c-means clustering , 2013, 2013 Joint IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS).

[45]  Witold Pedrycz,et al.  Information granularity, big data, and computational intelligence , 2015 .

[46]  Christopher Leckie,et al.  High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning , 2016, Pattern Recognit..

[47]  Witold Pedrycz,et al.  K-Means-based isolation forest , 2020, Knowl. Based Syst..