Tradeoffs in streaming binary classification under limited inspection resources

Institutions are increasingly relying on machine learning models to identify and alert on abnormal events, such as fraud, cyber attacks and system failures. These alerts often need to be manually investigated by specialists. Given the operational cost of manual inspections, the suspicious events are selected by alerting systems with carefully designed thresholds. In this paper, we consider an imbalanced binary classification problem, where events arrive sequentially and only a limited number of suspicious events can be inspected. We model the event arrivals as a non-homogeneous Poisson process, and compare various suspicious event selection methods including those based on static and adaptive thresholds. For each method, we analytically characterize the tradeoff between the minority-class detection rate and the inspection capacity as a function of the data class imbalance and the classifier confidence score densities. We implement the selection methods on a real public fraud detection dataset and compare the empirical results with analytical bounds. Finally, we investigate how class imbalance and the choice of classifier impact the tradeoff.

[1]  Sheldon M. Ross,et al.  Introduction to probability models , 1975 .

[2]  S. Albright Optimal Sequential Assignments with Random Arrival Times , 1974 .

[3]  Samuel A. Assefa,et al.  Non-Parametric Stochastic Sequential Assignment With Random Arrival Times , 2021, IJCAI.

[4]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[5]  Marvin Rausand,et al.  System Reliability Theory: Models, Statistical Methods, and Applications , 2003 .

[6]  B. Chandrasekaran Survey of Network Traffic Models , 2006 .

[7]  Ward Whitt,et al.  Are Call Center and Hospital Arrivals Well Modeled by Nonhomogeneous Poisson Processes? , 2014, Manuf. Serv. Oper. Manag..

[8]  Robert L. Wolpert,et al.  Statistical Inference , 2019, Encyclopedia of Social Network Analysis and Mining.

[9]  Peter A. W. Lewis,et al.  Statistical Analysis of Non-Stationary Series of Events in a Data Base System , 1976, IBM J. Res. Dev..

[10]  Yevgeniy Vorobeychik,et al.  Scalable Optimization of Randomized Operational Decisions in Adversarial Classification Settings , 2015, AISTATS.

[11]  Foster Provost,et al.  Machine Learning from Imbalanced Data Sets 101 , 2008 .

[12]  James R. Wilson,et al.  Least squares estimation of nonhomogeneous Poisson processes , 1998, 1998 Winter Simulation Conference. Proceedings (Cat. No.98CH36274).

[13]  Hongda Shen,et al.  Deep Q-network-based adaptive alert threshold selection policy for payment fraud systems in retail banking , 2020, ICAIF.

[14]  Eric R. Ziegel,et al.  System Reliability Theory: Models, Statistical Methods, and Applications , 2004, Technometrics.

[15]  Lawrence M. Leemis,et al.  Nonparametric Estimation of the Cumulative Intensity Function for a Nonhomogeneous Poisson Process from Overlapping Realizations , 2000 .

[16]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[17]  K. Giesecke Credit Risk Modeling and Valuation: An Introduction , 2004 .

[18]  Stephan Robert,et al.  Adaptive Financial Fraud Detection in Imbalanced Data with Time-Varying Poisson Processes , 2019 .

[19]  Esko Valkeila,et al.  An Introduction to the Theory of Point Processes, Volume II: General Theory and Structure, 2nd Edition by Daryl J. Daley, David Vere‐Jones , 2008 .

[20]  Steven Kou,et al.  A Jump Diffusion Model for Option Pricing , 2001, Manag. Sci..

[21]  W. Whitt,et al.  Choosing arrival process models for service systems: Tests of a nonhomogeneous Poisson process , 2014 .

[22]  Oluwasanmi Koyejo,et al.  Consistent Binary Classification with Generalized Performance Metrics , 2014, NIPS.

[23]  Melba M. Crawford,et al.  Modeling and simulation of a nonhomogeneous poisson process having cyclic behavior , 1991 .