论文信息 - FAIR-DB: FunctionAl DependencIes to discoveR Data Bias

FAIR-DB: FunctionAl DependencIes to discoveR Data Bias

Computers and algorithms have become essential tools that pervade all aspects of our daily lives; this technology is based on data and, for it to be reliable, we have to make sure that the data on which it is based on is fair and without bias. In this context, Fairness has become a relevant topic of discussion within the field of Data Science Ethics, and in general in Data Science. Today’s applications should therefore be associated with tools to discover bias in data, in order to avoid (possibly unintentional) unethical behavior and consequences; as a result, technologies that accurately discover discrimination and bias in databases are of paramount importance. In this work we propose FAIR-DB (FunctionAl dependencIes to discoveR Data Bias), a novel solution to detect biases and discover discrimination in datasets, that exploits the notion of Functional Dependency, a particular type of constraint on the data. The proposed solution is implemented as a framework that focuses on the mining of such dependencies, also proposing some new metrics for evaluating the bias found in the input dataset. Our tool can identify the attributes of the database that encompass discrimination (e.g. gender, ethnicity or religion) and the ones that instead verify various fairness measures; moreover, based on special aspects of these metrics and the intrinsic nature of dependencies, the framework provides very precise information about the groups treated unequally, obtaining more insights regarding the bias present in dataset compared to other existing tools. Finally, our system also suggests possible future steps, by indicating the most appropriate (already existing) algorithms to correct the dataset on the basis of the computed results.

Letizia Tanca | Fabio Azzalini | Chiara Criscuolo

[1] Roxana Geambasu,et al. FairTest: Discovering Unwarranted Associations in Data-Driven Applications , 2015, 2017 IEEE European Symposium on Security and Privacy (EuroS&P).

[2] Bill Howe,et al. Nutritional Labels for Data and Models , 2019, IEEE Data Eng. Bull..

[3] Toon Calders,et al. Data preprocessing techniques for classification without discrimination , 2011, Knowledge and Information Systems.

[4] Julius Adebayo,et al. FairML : ToolBox for diagnosing bias in predictive modeling , 2016 .

[5] Yunfeng Zhang,et al. AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias , 2019, IBM Journal of Research and Development.

[6] H. V. Jagadish,et al. Responsible data management , 2020, Proc. VLDB Endow..

[7] Daniel T. Larose,et al. Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[8] Giuseppe Polese,et al. Relaxed Functional Dependencies—A Survey of Approaches , 2016, IEEE Transactions on Knowledge and Data Engineering.

[9] Floris Geerts,et al. Revisiting Conditional Functional Dependency Discovery: Splitting the "C" from the "FD" , 2018, ECML/PKDD.

[10] Letizia Tanca,et al. Ethical Dimensions for Data Quality , 2019, ACM J. Data Inf. Qual..

[11] Toniann Pitassi,et al. Learning Adversarially Fair and Transferable Representations , 2018, ICML.

[12] Carlos Eduardo Scheidegger,et al. Certifying and Removing Disparate Impact , 2014, KDD.

[13] Laks V. S. Lakshmanan,et al. Discovering Conditional Functional Dependencies , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[14] Kush R. Varshney,et al. Optimized Pre-Processing for Discrimination Prevention , 2017, NIPS.