A Peer Dataset Comparison Outlier Detection Model Applied to Financial Surveillance

Outlier detection is a key element for intelligent financial surveillance system. The detection procedures generally fall into two categories: comparing every transaction against its account history and further more, comparing against a peer group to determine if the behavior is unusual. The later approach shows particular merits in efficiently extracting suspicious transaction and reducing false positive rate. Peer group analysis concept is largely dependent on a cross-datasets outlier detection model. In this paper, we propose a new cross outlier detection model based on distance definition incorporated with the financial transaction data features. An approximation algorithm accompanied with the model is provided to optimize the computation of the deviation from tested data point to the reference dataset. An experiment based on real bank data blended with synthetic outlier cases shows promising results of our model in reducing false positive rate while enhancing the discriminative rate remarkably

[1]  D. Timmerman,et al.  Classifying ovarian tumors using Bayesian Multi-Layer Perceptrons and Automatic Relevance Determination: A multi-center study , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[2]  David Lowe,et al.  MILVA: An interactive tool for the exploration of multidimensional microarray data , 2005, Bioinform..

[3]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[4]  Ian T. Nabney,et al.  Leading edge forecasting techniques for exchange rate prediction , 1995 .

[5]  Christos Faloutsos,et al.  Spatial join selectivity using power laws , 2000, SIGMOD '00.

[6]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[7]  I. Nabney,et al.  Non-linear Prediction of Quantitative Structure – Activity Relationships , 2004 .

[8]  Peter Tiño,et al.  Using Directional Curvatures to Visualize Folding Patterns of the GTM Projection Manifolds , 2001, ICANN.

[9]  I. Nabney,et al.  Constructing localized non-linear projection manifolds in a principled way:hierarchical generative topographic mapping , 2000 .

[10]  W. A. Wright,et al.  Bayesian approach to neural-network modeling with input uncertainty , 1999, IEEE Trans. Neural Networks.

[11]  Christopher M. Bishop,et al.  Modelling conditional probability distributions for periodic variables , 1995 .

[12]  N. Hjort,et al.  Statistical Models and Methods for Discontinuous Phenomena , 2002 .

[13]  Ian T. Nabney,et al.  Visual data mining using principled projection algorithms and information visualization techniques , 2006, KDD '06.

[14]  Ian T. Nabney,et al.  Regularisation of mixture density networks , 1999 .

[15]  Dan Cornford,et al.  Bayesian inference for wind field retrieval , 2000, Neurocomputing.

[16]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[17]  Dan Cornford,et al.  Outlier detection in scatterometer data: neural network approaches , 2003, Neural Networks.

[18]  D. J. Evans,et al.  Benchmarking beat classification algorithms , 2001, Computers in Cardiology 2001. Vol.28 (Cat. No.01CH37287).

[19]  Christos Faloutsos,et al.  Tri-plots: scalable tools for multidimensional data mining , 2001, KDD '01.

[20]  Ian T. Nabney,et al.  Guiding Local Regression Using Visualisation , 2004, Deterministic and Statistical Methods in Machine Learning.

[21]  Ian T. Nabney,et al.  Efficient Training Of Rbf Networks For Classification , 2004, Int. J. Neural Syst..

[22]  Ian T. Nabney,et al.  Prediction of paroxysmal atrial fibrillation , 2005 .

[23]  Ian T. Nabney,et al.  Bayesian training of mixture density networks , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[24]  Dan Cornford,et al.  Structured neural network modelling of multi-valued functions for wind vector retrieval from satellite scatterometer measurements , 2000, Neurocomputing.

[25]  Christos Faloutsos,et al.  Cross-Outlier Detection , 2003, SSTD.

[26]  Ian T. Nabney,et al.  Neural control of a batch distillation , 2005, Neural Computing & Applications.

[27]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.

[28]  I. Nabney,et al.  Improved neural network scatterometer forward models , 2001 .

[29]  Rajeev Rastogi,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD 2000.

[30]  I. Nabney,et al.  Semisupervised learning of hierarchical latent trait models for data visualization , 2005, IEEE Transactions on Knowledge and Data Engineering.

[31]  C. D. Buckingham,et al.  Developing a Computer Decision Support System for Mental Health Risk Screening and Assessment , 2003 .

[32]  Ian T. Nabney,et al.  Data Visualization during the Early Stages of Drug Discovery , 2006, J. Chem. Inf. Model..

[33]  Yi Sun,et al.  A principled approach to interactive hierarchical non-linear visualization of high-dimensional data , 2002 .

[34]  Ian T. Nabney,et al.  Neural network control of a gas turbine , 2005, Neural Computing & Applications.

[35]  Christopher K. I. Williams,et al.  Modelling Frontal Discontinuities in Wind Fields , 2002 .