Application of Benford’s law in Data Analysis

In the era of big data, data is growing explosively. Data quality control has become a key factor in maximizing data value. It is important and urgent to establish a scientific data quality detection method. Benford's law has become an effective tool for detection of data quality and identification of anomaly data in various fields. On the basis of expounding the basic principles of Benford's law, this paper summarizes its application in different levels of natural and social sciences, explores the data conditions of the law and the factors affecting the accuracy of the detection, and proposes the model improvement idea in three aspects of extending detection scope, combination of several methods and strengthening the explanation of detection results. The development of Benford's Law in the future requires scholars from all fields to study more about its essence, strengthen its integration with other data processing technologies, and then expand its application.

[1]  JaeHyeon,et al.  The Possibility of Detecting Match-fixing : Benford's Law in Sports Data , 2017 .

[2]  R. A. Raimi The First Digit Problem , 1976 .

[3]  Simon Newcomb,et al.  Note on the Frequency of Use of the Different Digits in Natural Numbers , 1881 .

[4]  Amir-Hossein Jahangir,et al.  Benford's law behavior of Internet traffic , 2014, J. Netw. Comput. Appl..

[5]  B. Schmeiser,et al.  Survival Distributions Satisfying Benford's Law , 2000 .

[6]  Stefanos Leontsinis,et al.  Benford’s Law in Astronomy , 2014 .

[7]  Pepijn de Vries,et al.  Compliance of LC50 and NOEC data with Benford's Law: an indication of reliability? , 2013, Ecotoxicology and environmental safety.

[8]  Ben Jann,et al.  Benford’s Law and Fraud Detection: Facts and Legends , 2010 .

[9]  T. Hill The Significant-Digit Phenomenon , 1995 .

[10]  David Torres,et al.  Quick Anomaly Detection by the Newcomb--Benford Law, with Applications to Electoral Processes Data from the USA, Puerto Rico and Venezuela , 2012, 1205.3290.

[11]  B. Ma,et al.  Empirical mantissa distributions of pulsars , 2010, 1005.1702.

[12]  Thomas Kronfeld,et al.  Deficit versus social statistics: empirical evidence for the effectiveness of Benford’s law , 2014 .

[13]  M. Nigrini,et al.  The Use of Benford's Law as an Aid in Analytical Procedures , 1997 .

[14]  H. Dominic Covvey,et al.  Adaptive Fraud Detection Using Benford's Law , 2006, Canadian Conference on AI.

[15]  Anna M. Rose,et al.  Turn Excel into a Financial Sleuth: An Easy-to-Use Digital Analysis Tool Can Red-Flag Irregularities , 2003 .

[16]  B. Little,et al.  Benford’s Law, Data Mining, And FinancialFraud: A Case Study In New York StateMedicaid Data , 2008 .

[17]  Jean-Christophe Pain,et al.  Benford's law and complex atomic spectra. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Feng Deng,et al.  Big Data Security and Privacy Protection , 2014 .

[19]  Andreas Diekmann,et al.  Not the First Digit! Using Benford's Law to Detect Fraudulent Scientif ic Data , 2007 .

[20]  Andrea Ellero,et al.  Checking financial markets via Benford’s law: the S&P 500 case , 2010 .

[21]  Jesse L. Barlow,et al.  On roundoff error distributions in floating point and logarithmic arithmetic , 1985, Computing.

[22]  Carsten A. Holz The Quality of China's GDP Statistics , 2013 .

[23]  J. Torres,et al.  How do numbers begin? (The first digit law) , 2007 .

[24]  Liming Guan,et al.  Heaping in Reported Earnings: Evidence from Monthly Financial Reports of Taiwanese Firms , 2011 .

[25]  Jennifer Golbeck,et al.  Benford’s Law Applies to Online Social Networks , 2015, PloS one.

[26]  Jean-Michel Jolion,et al.  Images and Benford's Law , 2001, Journal of Mathematical Imaging and Vision.

[27]  Wei Su,et al.  A generalized Benford's law for JPEG coefficients and its applications in image forensics , 2007, Electronic Imaging.

[28]  Matthew J. Hickman,et al.  Digital Analysis of Crime Statistics: Does Crime Conform to Benford’s Law? , 2010 .