An Unsupervised Density Based Clustering Algorithm to Detect Election Anomalies : Evidence from Georgia’s Largest County

The 2020 election was fraught with allegations of fraud. To respond to a lack of a robust method to investigate these allegations, we propose a multi-step clustering based approach. We first solve a regression problem to find a group of influential variables, then cluster on these variables to get a set of precincts that should have similar election results. Re-clustering each cluster shows us the outliers. We then apply the approach to Fulton County, Georgia’s largest county and an epicenter of allegations of corruption and fraud. We show that the level of fraud detected is not significant and would not be enough to change the election results in Georgia. In fact, the majority of the precincts that showed to be anomalous were ones where Trump received more votes than was expected. We also validate our analysis through application to the 2015 Argentina National Election.

[1]  Andrew C. Eggers,et al.  No evidence for systematic voter fraud: A guide to statistical claims about the 2020 election , 2021, Proceedings of the National Academy of Sciences.

[2]  Daniel Kuhn,et al.  From Data to Decisions: Distributionally Robust Optimization is Optimal , 2017, Manag. Sci..

[3]  John R. Lott A Simple Test for the Extent of Vote Fraud with Absentee Ballots in the 2020 Presidential Election: Georgia and Pennsylvania Data , 2020 .

[4]  Savva Shanaev,et al.  Detecting Anomalies in the 2020 US Presidential Election Votes with Benford’s Law , 2020 .

[5]  D. Sheskin The Kolmogorov–Smirnov Goodness-of-Fit Test for a Single Sample , 2020, Handbook of Parametric and Nonparametric Statistical Procedures.

[6]  Mali Zhang,et al.  Election forensics: Using machine learning and synthetic data for possible election anomaly detection , 2019, PloS one.

[7]  Francisco Cantú The Fingerprints of Fraud: Evidence from Mexico’s 1988 Presidential Election , 2019, American Political Science Review.

[8]  F. Benabbou,et al.  Performance of machine learning techniques in the detection of financial frauds , 2019, Procedia Computer Science.

[9]  M. Pushpa,et al.  Analysis on credit card fraud identification techniques based on KNN and outlier detection , 2017, 2017 Third International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB).

[10]  Arturas Rozenas,et al.  Detecting Election Fraud from Irregularities in Vote-Share Distributions , 2017, Political Analysis.

[11]  R. Michael Alvarez,et al.  Using Machine Learning Algorithms to Detect Election Fraud , 2016, Computational Social Science.

[12]  W. Mebane Comment on “Benford's Law and the Detection of Election Fraud” , 2011, Political Analysis.

[13]  Peter C. Ordeshook,et al.  Benford's Law and the Detection of Election Fraud , 2011, Political Analysis.

[14]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[15]  H. G. Knotts,et al.  Location, Location, Location: Precinct Placement and the Costs of Voting , 2005, The Journal of Politics.

[16]  Valeria Brusco,et al.  Vote Buying in Argentina , 2004 .