Anytime algorithm for frequent pattern outlier detection

Outlier detection consists in detecting anomalous observations from data. During the past decade, outlier detection methods were proposed using the concept of frequent patterns. Basically such methods require to mine all frequent patterns for computing the outlier factor of each transaction. This approach remains too expensive despite recent progress in pattern mining field to provide results within a short response time of only a few seconds. In this paper, we provide the first anytime method for calculating the frequent pattern outlier factor (FPOF). This method which can be interrupted at anytime by the end-user accurately approximates FPOF by mining a sample of patterns. It also computes the maximum error on the estimated FPOF for helping the user to stop the process at the right time. Experiments show the interest of this method for very large datasets where exhaustive mining fails to provide good approximate solutions. The accuracy of our anytime approximate method outperforms the baseline approach for a same budget in number of patterns.

[1]  Philip S. Yu,et al.  Detecting abnormal coupled sequences and sequence changes in group-based manipulative trading behaviors , 2010, KDD.

[2]  Arnaud Giacometti,et al.  Frequent Pattern Outlier Detection Without Exhaustive Mining , 2016, PAKDD.

[3]  Arnaud Giacometti,et al.  Balancing the Analysis of Frequent Patterns , 2014, PAKDD.

[4]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[5]  Matthijs van Leeuwen Interactive Data Exploration Using Pattern Mining , 2014, Interactive Knowledge Discovery and Data Mining in Biomedical Informatics.

[6]  Mohammad Al Hasan,et al.  Output Space Sampling for Graph Patterns , 2009, Proc. VLDB Endow..

[7]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[8]  Bart Goethals,et al.  Providing Concise Database Covers Instantly by Recursive Tile Sampling , 2014, Discovery Science.

[9]  Rajveer Saini,et al.  Subspace anytime stream clustering , 2014, SSDBM '14.

[10]  Michael Georgiopoulos,et al.  Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data , 2011, Knowledge and Information Systems.

[11]  Arnaud Giacometti,et al.  20 years of pattern mining: a bibliometric survey , 2014, SKDD.

[12]  Shaul Markovitch,et al.  Anytime learning of anycost classifiers , 2011, Machine Learning.

[13]  Changhe Yuan,et al.  A Depth-First Branch and Bound Algorithm for Learning Optimal Bayesian Networks , 2013, GKR.

[14]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[15]  Shlomo Zilberstein,et al.  Optimal Composition of Real-Time Systems , 1996, Artif. Intell..

[16]  Mario Boley,et al.  Instant Exceptional Model Mining Using Weighted Controlled Pattern Sampling , 2014, IDA.

[17]  Mark S. Boddy,et al.  Deliberation Scheduling for Problem Solving in Time-Constrained Environments , 1994, Artif. Intell..

[18]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[19]  Jeff G. Schneider,et al.  Detecting anomalous records in categorical datasets , 2007, KDD '07.

[20]  Ira Assent,et al.  AnyOut: Anytime Outlier Detection on Streaming Data , 2012, DASFAA.

[21]  Nicolas Durand,et al.  ECCLAT: a New Approach of Clusters Discovery in Categorical Data , 2003 .

[22]  Christian Böhm,et al.  Efficient Anytime Density-based Clustering , 2013, SDM.

[23]  Zengyou He,et al.  FP-outlier: Frequent pattern based outlier detection , 2005, Comput. Sci. Inf. Syst..

[24]  Mohammad Al Hasan,et al.  ORIGAMI: A Novel and Effective Approach for Mining Representative Orthogonal Graph Patterns , 2008 .

[25]  Srinivasan Parthasarathy,et al.  Fast Distributed Outlier Detection in Mixed-Attribute Data Sets , 2006, Data Mining and Knowledge Discovery.

[26]  Christos Faloutsos,et al.  Fast and reliable anomaly detection in categorical data , 2012, CIKM.

[27]  Daniel Paurat,et al.  Direct local pattern sampling by efficient two-step random procedures , 2011, KDD.

[28]  Johannes Fürnkranz,et al.  From Local Patterns to Global Models: The LeGo Approach to Data Mining , 2008 .

[29]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[30]  Sudeshna Sarkar,et al.  Anytime Algorithms for Mining Groups with Maximum Coverage , 2012, AusDM.

[31]  Ling Chen,et al.  Outlier Detection in Complex Categorical Data by Modeling the Feature Value Couplings , 2016, IJCAI.

[32]  Guozhu Dong,et al.  CPCQ: Contrast pattern based clustering quality index for categorical data , 2012, Pattern Recognit..