An evolving Takagi-Sugeno model based on aggregated trapezium clouds for anomaly detection in large datasets

Anomaly detection is an important task for applications involving Big Data. Comparing with traditional method, anomaly detection in Big Data confronts growing amounts of data with high dimensionality and complex structures, which require more real-time analysis. This paper presents a fuzzy input-output system for anomalous data using electronic consumer records (ECR), a trapezium-cloud-map-filtration (TCMF) framework and a value mining model. ECRs are used to add or remove criteria based on consumers’ consumption. In addition, MapReduce framework and trapezium clouds generated from each subsample are aggregated by using the aggregated trapezium cloud as a filter for each subsample. Then, a fuzzy logic-based value mining model is proposed based on Takagi-Sugeno model (T-S model) and trapezium clouds. This paper establishes a system that can improve decision-making accuracy by filtering large-scale data, and an illustrative example using a hotel booking situation is presented to verify the validity and feasibility of the proposed model. Finally, a comparative analysis is conducted between the proposed approach and existing methods.

[1]  Hisao Ishibuchi,et al.  Classification and modeling with linguistic information granules - advanced approaches to linguistic data mining , 2004, Advanced information processing.

[2]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[3]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[4]  Francisco Herrera,et al.  MRPR: A MapReduce solution for prototype reduction in big data classification , 2015, Neurocomputing.

[5]  Hong-yu Zhang,et al.  A Neutrosophic Normal Cloud and Its Application in Decision-Making , 2016, Cognitive Computation.

[6]  Zhang-peng Tian,et al.  Multicriteria decision-making approach based on gray linguistic weighted Bonferroni mean operator , 2018, Int. Trans. Oper. Res..

[7]  Ting-Cheng Chang,et al.  A self-testing cloud model for multi-criteria group decision making , 2016 .

[8]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[9]  Hong-yu Zhang,et al.  An Extended Outranking Approach to Rough Stochastic Multi-criteria Decision-Making Problems , 2016, Cognitive Computation.

[10]  Mark Last Automated Detection of Outliers in Real-World Data , 2001 .

[11]  Hans-Peter Kriegel,et al.  Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data , 2009, PAKDD.

[12]  Al-Dahoud Ali,et al.  Fuzzy clustering-based approach for outlier detection , 2010 .

[13]  Francisco Herrera,et al.  Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data , 2015, Fuzzy Sets Syst..

[14]  Hong-yu Zhang,et al.  Multi-criteria decision-making approaches based on distance measures for linguistic hesitant fuzzy sets , 2016, J. Oper. Res. Soc..

[15]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[16]  Wang Jian-qian,et al.  Multiple criteria group decision making method based on intuitionistic normal cloud by Monte Carlo simulation , 2013 .

[17]  Chang Liu,et al.  A cloud-based framework for Home-diagnosis service over big medical data , 2015, J. Syst. Softw..

[18]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[19]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[20]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[21]  Hong-yu Zhang,et al.  Atanassov's Interval-Valued Intuitionistic Linguistic Multicriteria Group Decision-Making Method Based on the Trapezium Cloud Model , 2015, IEEE Transactions on Fuzzy Systems.

[22]  Hong-yu Zhang,et al.  A likelihood-based TODIM approach based on multi-hesitant fuzzy linguistic information for evaluation in logistics outsourcing , 2016, Comput. Ind. Eng..

[23]  Pilsung Kang,et al.  The effects of different alphabets on free text keystroke authentication: A case study on the Korean-English users , 2015, J. Syst. Softw..

[24]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[25]  Jianqiang Wang,et al.  An Uncertain Linguistic Multi-criteria Group Decision-Making Method Based on a Cloud Model , 2014, Group Decision and Negotiation.

[26]  Robert P. W. Duin,et al.  Data description in subspaces , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[27]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[28]  Dimitris K. Tasoulis,et al.  Online annotation and prediction for regime switching data streams , 2009, SAC '09.

[29]  Matthew B. Jones,et al.  Challenges and Opportunities of Open Data in Ecology , 2011, Science.

[30]  Jian-qiang Wang,et al.  An Interval Type-2 Fuzzy Likelihood-Based MABAC Approach and Its Application in Selecting Hotels on a Tourism Website , 2017, Int. J. Fuzzy Syst..

[31]  Marco Vannucci,et al.  A fuzzy logic-based method for outliers detection , 2007, Artificial Intelligence and Applications.

[32]  Li Ma,et al.  Energy Utilization Evaluation of Carbon Performance in Public Projects by FAHP and Cloud Model , 2016 .

[33]  Hans-Peter Kriegel,et al.  A survey on unsupervised outlier detection in high‐dimensional numerical data , 2012, Stat. Anal. Data Min..

[34]  Jiahua Chen,et al.  Variable Selection in Finite Mixture of Regression Models , 2007 .

[35]  S. Borguet,et al.  Comparison of adaptive filters for gas turbine performance monitoring , 2010, J. Comput. Appl. Math..

[36]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[37]  Arthur Zimek,et al.  Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection , 2015, ACM Trans. Knowl. Discov. Data.

[39]  S. Geer,et al.  ℓ1-penalization for mixture regression models , 2010, 1202.6046.

[40]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.

[41]  Jianqiang Wang,et al.  Multi-criteria Group Decision-Making Method Based on Intuitionistic Interval Fuzzy Information , 2012, Group Decision and Negotiation.

[42]  Hong-yu Zhang,et al.  An FMCDM approach to purchasing decision-making based on cloud model and prospect theory in e-commerce , 2016, Int. J. Comput. Intell. Syst..

[43]  Gu Zhi-peng,et al.  Application of trapezium-cloud model in conception division and conception exaltation , 2008 .

[44]  Roman Garnett,et al.  Active Data Selection for Sensor Networks with Faults and Changepoints , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[45]  Marimuthu Palaniswami,et al.  Incremental Elliptical Boundary Estimation for Anomaly Detection in Wireless Sensor Networks , 2011, 2011 IEEE 11th International Conference on Data Mining.

[46]  Milos Manic,et al.  Fuzzy logic based anomaly detection for embedded network security cyber sensor , 2011, 2011 IEEE Symposium on Computational Intelligence in Cyber Security (CICS).

[47]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[48]  Marimuthu Palaniswami,et al.  Evolving Fuzzy Rules for Anomaly Detection in Data Streams , 2015, IEEE Transactions on Fuzzy Systems.

[49]  K.L. Lo,et al.  Electricity consumer classification using artificial intelligence , 2004, 39th International Universities Power Engineering Conference, 2004. UPEC 2004..