Mining frequent pyramid patterns from time series transaction data with custom constraints

Abstract For the problem of mining pyramid scheme patterns, the traditional sequential pattern mining algorithm Prefixspan has many disadvantages such as poor timeliness, uniform threshold, etc. Therefore, we propose a timeliness variable threshold and increment Prefixspan algorithm, named TVI-Prefixspan, for mining the sequential patterns from time series transaction data. To be specific, TVI-Prefixspan aims to mine the patterns that co-occurrence in both an individual sequence and different sequences with high frequency. The most important challenges are how to define the thresholds of frequent one-item and pyramid patterns. We firstly analyze the attributes of the patterns which are hidden in the financial activities between different bank accounts. Secondly, the frequent threshold of each one-item is determined by its different frequency value in normal and pyramid related transaction sequences. We also consider the special relationships in both numerical values and time-series aspects between each pattern’s item. Therefore, TVI-Prefixspan produces the frequent one-item set based on its difference of the normal frequency, and then, mines the pyramid patterns with formulated relation constraints. For describing the correlation, we consider sequential, time interval and one-off constraints simultaneously. The experimental results, in real financial data containing pyramid transactions, show that TVI-Prefixspan algorithm succeeds in mining pyramid scheme patterns quickly and effectively. It is superior to traditional sequential pattern mining algorithms such as Prefixspan in efficiency and mining effect.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Chichang Jou,et al.  A data mining approach to discovering reliable sequential patterns , 2013, J. Syst. Softw..

[3]  James Bailey,et al.  Efficient discovery of contrast subspaces for object explanation and characterization , 2015, Knowledge and Information Systems.

[4]  Anna Lechner,et al.  Molecular networking and pattern-based genome mining improves discovery of biosynthetic gene clusters and their products from Salinispora species. , 2015, Chemistry & biology.

[5]  Hamido Fujita,et al.  An efficient algorithm for mining high utility patterns from incremental databases with one database scan , 2017, Knowl. Based Syst..

[6]  Xu Wang,et al.  Variance Minimization Hedging Analysis Based on a Time-Varying Markovian DCC-GARCH Model , 2020, IEEE Transactions on Automation Science and Engineering.

[7]  D. Altman,et al.  Multiple significance tests: the Bonferroni method , 1995, BMJ.

[8]  Tingting Wang,et al.  Mining distinguishing customer focus sets from online customer reviews , 2018, Computing.

[9]  Wisnu Jatmiko,et al.  Traffic big data prediction and visualization using Fast Incremental Model Trees-Drift Detection (FIMT-DD) , 2016, Knowl. Based Syst..

[10]  Ben-Ari FuchsShani,et al.  GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data , 2016 .

[11]  James Bailey,et al.  Mining minimal distinguishing subsequence patterns with gap constraints , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[12]  Kyung-Yong Chung,et al.  Sequential pattern profiling based bio-detection for smart health service , 2014, Cluster Computing.

[13]  Longbing Cao,et al.  e-RNSP: An Efficient Method for Mining Repetition Negative Sequential Patterns , 2020, IEEE Transactions on Cybernetics.

[14]  Jun Wu,et al.  Significance-based discriminative sequential pattern mining , 2019, Expert Syst. Appl..

[15]  Jianyong Wang,et al.  Efficiently Mining Closed Subsequences with Gap Constraints , 2008, SDM.

[16]  Jiawei Han,et al.  Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[17]  Xindong Wu,et al.  Efficient sequential pattern mining with wildcards for keyphrase extraction , 2017, Knowl. Based Syst..

[18]  Xingquan Zhu,et al.  NOSEP: Nonoverlapping Sequence Pattern Mining With Gap Constraints. , 2018, IEEE transactions on cybernetics.

[19]  Fan Min,et al.  Frequent pattern discovery with tri-partition alphabets , 2020, Inf. Sci..

[20]  Toon Calders,et al.  Mining Compressing Sequential Patterns , 2014, Stat. Anal. Data Min..

[21]  James Bailey,et al.  Contrast Data Mining: Concepts, Algorithms, and Applications , 2012 .

[22]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[23]  Chungang Yan,et al.  Improved TrAdaBoost and its Application to Transaction Fraud Detection , 2020, IEEE Transactions on Computational Social Systems.

[24]  Stefan Decker,et al.  Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach , 2018, Inf. Sci..

[25]  Diego Reforgiato Recupero,et al.  Deep learning and time series-to-image encoding for financial forecasting , 2020, IEEE/CAA Journal of Automatica Sinica.

[26]  Zhenzhong Xu,et al.  A Bayesian Failure Prediction Network Based on Text Sequence Mining and Clustering , 2018, Entropy.

[27]  Hung T. Nguyen,et al.  Hypotension Risk Prediction via Sequential Contrast Patterns of ICU Blood Pressure , 2016, IEEE Journal of Biomedical and Health Informatics.

[28]  Xindong Wu,et al.  Co-occurrence pattern mining based on a biological approximation scoring matrix , 2018, Pattern Analysis and Applications.

[29]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[30]  Jiahai Wang,et al.  Financial time series prediction using a dendritic neuron model , 2016, Knowl. Based Syst..

[31]  Wei Cao,et al.  An effective contrast sequential pattern mining approach to taxpayer behavior analysis , 2015, World Wide Web.

[32]  Jun Wu,et al.  Mining conditional discriminative sequential patterns , 2019, Inf. Sci..

[33]  Guoliang Chen,et al.  A fast algorithm for mining association rules , 2008, Journal of Computer Science and Technology.

[34]  Guoyin Wang,et al.  A Decision-Theoretic Rough Set Approach for Dynamic Data Mining , 2015, IEEE Transactions on Fuzzy Systems.

[35]  Fuliang Xie,et al.  Deep sequencing reveals important roles of microRNAs in response to drought and salinity stress in cotton , 2014, Journal of experimental botany.

[36]  Fei Xie,et al.  Mining Sequential Patterns with Wildcards and the One-Off Condition: Mining Sequential Patterns with Wildcards and the One-Off Condition , 2014 .

[37]  Cong Shen,et al.  Strict pattern matching under non-overlapping condition , 2016, Science China Information Sciences.

[38]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.