Anomaly Detection in Vertically Partitioned Data by Distributed Core Vector Machines

Observations of physical processes suffer from instrument malfunction and noise and demand data cleansing. However, rare events are not to be excluded from modeling, since they can be the most interesting findings. Often, sensors collect features at different sites, so that only a subset is present (vertically distributed data). Transferring all data or a sample to a single location is impossible in many real-world applications due to restricted bandwidth of communication. Finding interesting abnormalities thus requires efficient methods of distributed anomaly detection. We propose a new algorithm for anomaly detection on vertically distributed data. It aggregates the data directly at the local storage nodes using RBF kernels. Only a fraction of the data is communicated to a central node. Through extensive empirical evaluation on controlled datasets, we demonstrate that our method is an order of magnitude more communication efficient than state of the art methods, achieving a comparable accuracy.

[1]  Kanishka Bhaduri,et al.  Distributed anomaly detection using 1‐class SVM for vertically partitioned data , 2011, Stat. Anal. Data Min..

[2]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[3]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[4]  Jingxiong Zhang,et al.  Anomaly detection in MODIS land products via time series analysis , 2007 .

[5]  Edward Y. Chang,et al.  Parallelizing Support Vector Machines on Distributed Computers , 2007, NIPS.

[6]  Kenneth L. Clarkson,et al.  Optimal core-sets for balls , 2008, Comput. Geom..

[7]  Ashok N. Srivastava,et al.  Multiple kernel learning for heterogeneous anomaly detection: algorithm and aviation safety case study , 2010, KDD.

[8]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[9]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[10]  Andrew Kusiak,et al.  Data Mining in Manufacturing: A Review , 2006 .

[11]  Srinivasan Parthasarathy,et al.  Fast Distributed Outlier Detection in Mixed-Attribute Data Sets , 2006, Data Mining and Knowledge Discovery.

[12]  P. Tsakalides,et al.  Optimal gossip algorithm for distributed consensus SVM training in wireless sensor networks , 2009, 2009 16th International Conference on Digital Signal Processing.

[13]  Igor Durdanovic,et al.  Parallel Support Vector Machines: The Cascade SVM , 2004, NIPS.

[14]  Kanishka Bhaduri,et al.  Distributed Data Mining in Sensor Networks , 2013, Managing and Mining Sensor Data.

[15]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[16]  Katharina Morik,et al.  Separable Approximate Optimization of Support Vector Machines for Distributed Sensing , 2012, ECML/PKDD.

[17]  Georgios B. Giannakis,et al.  Consensus-Based Distributed Support Vector Machines , 2010, J. Mach. Learn. Res..

[18]  David Wai-Lok Cheung,et al.  Parallel Mining of Outliers in Large Database , 2004, Distributed and Parallel Databases.

[19]  Gernot Heiser,et al.  An Analysis of Power Consumption in a Smartphone , 2010, USENIX Annual Technical Conference.

[20]  Charu C. Aggarwal,et al.  Managing and Mining Sensor Data , 2013, Springer US.

[21]  Salvatore J. Stolfo,et al.  Distributed data mining in credit card fraud detection , 1999, IEEE Intell. Syst..

[22]  Domenico Talia,et al.  Euro-Par 2010 - Parallel Processing , 2010, Lecture Notes in Computer Science.

[23]  Thomas Gärtner,et al.  Efficient co-regularised least squares regression , 2006, ICML.

[24]  M. M. Moya,et al.  One-class classifier networks for target recognition applications , 1993 .

[25]  Vwani P. Roychowdhury,et al.  Distributed Parallel Support Vector Machines in Strongly Connected Networks , 2008, IEEE Transactions on Neural Networks.

[26]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[27]  Kanishka Bhaduri,et al.  Algorithms for speeding up distance-based outlier detection , 2011, KDD.

[28]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[29]  Hao Wang,et al.  PSVM : Parallelizing Support Vector Machines on Distributed Computers , 2007 .

[30]  Tamir Hazan,et al.  A Parallel Decomposition Solver for SVM: Distributed dual ascend using Fenchel Duality , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[32]  Edgar Acuña,et al.  Parallel algorithms for distance-based and density-based outliers , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[33]  Claudio Sartori,et al.  A Distributed Approach to Detect Outliers in Very Large Data Sets , 2010, Euro-Par.

[34]  S. Sathiya Keerthi,et al.  A fast iterative nearest point algorithm for support vector machine classifier design , 2000, IEEE Trans. Neural Networks Learn. Syst..