An Interpretable Graph-based Mapping of Trustworthy Machine Learning Research

There is an increasing interest in ensuring machine learning (ML) frameworks behave in a socially responsible manner and are deemed trustworthy. Although considerable progress has been made in the field of Trustworthy ML (TwML) in the recent past, much of the current characterization of this progress is qualitative. Consequently, decisions about how to address issues of trustworthiness and future research goals are often left to the interested researcher. In this paper, we present the first quantitative approach to characterize the comprehension of TwML research. We build a co-occurrence network of words using a web-scraped corpus of more than 7,000 peer-reviewed recent ML papers—consisting of papers both related and unrelated to TwML. We use community detection to obtain semantic clusters of words in this network that can infer relative positions of TwML topics. We propose an innovative fingerprinting algorithm to obtain probabilistic similarity scores for individual words, then combine them to give a paper-level relevance score. The outcomes of our analysis inform a number of interesting insights on advancing the field of TwML research.

[1]  Harald Steck,et al.  Calibrated recommendations , 2018, RecSys.

[2]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[3]  Sagar Kamarthi,et al.  Correction: Novel keyword co-occurrence network-based methods to foster systematic reviews of scientific literature , 2017, PloS one.

[4]  Eugene Santos,et al.  Explaining Reward Functions in Markov Decision Processes , 2019, FLAIRS.

[5]  Kristina Lerman,et al.  A Survey on Bias and Fairness in Machine Learning , 2019, ACM Comput. Surv..

[6]  Alessandro Vespignani,et al.  Mapping the physics research space: a machine learning approach , 2019, EPJ Data Science.

[7]  Taoying Li,et al.  Co-Occurrence Network of High-Frequency Words in the Bioinformatics Literature: Structural Characteristics and Evolution , 2018, Applied Sciences.

[8]  Kush R. Varshney,et al.  Socially Responsible AI Algorithms: Issues, Purposes, and Challenges , 2021, Journal of Artificial Intelligence Research.

[9]  Ehsan Toreini,et al.  The relationship between trust in AI and trustworthy machine learning technologies , 2019, FAT*.

[10]  H. Stanley,et al.  The science of science: from the perspective of complex systems , 2017 .

[11]  Philippe Lamontagne,et al.  Towards a Robust and Trustworthy Machine Learning System Development , 2021, ArXiv.

[12]  Makhamisa Senekane,et al.  Differentially Private Image Classification Using Support Vector Machine and Differential Privacy , 2019, Mach. Learn. Knowl. Extr..

[13]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[14]  Patrick Forr'e,et al.  Improving Fair Predictions Using Variational Inference In Causal Models , 2020, ArXiv.

[15]  Jevin D. West,et al.  Leveraging Citation Networks to Visualize Scholarly Influence Over Time , 2016, Front. Res. Metr. Anal..

[16]  Y. Kompatsiaris,et al.  Applying Fairness Constraints on Graph Node Ranks Under Personalization Bias , 2020, COMPLEX NETWORKS.

[17]  Bhiksha Raj,et al.  Large Margin Multiclass Gaussian Classification with Differential Privacy , 2010, PSDML.

[18]  Zhiyong Lu,et al.  Navigating the landscape of COVID-19 research through literature analysis: A bird's eye view , 2020, ArXiv.

[19]  Maoguo Gong,et al.  A Survey on Differentially Private Machine Learning [Review Article] , 2020, IEEE Computational Intelligence Magazine.

[20]  Carl T. Bergstrom,et al.  The Science of Science , 2018, Science.

[21]  Aythami Morales,et al.  SensitiveLoss: Improving Accuracy and Fairness of Face Representations with Discrimination-Aware Deep Learning , 2020, ArXiv.

[22]  Stanley Osher,et al.  Enhanced statistical rankings via targeted data collection , 2013, ICML.

[23]  Stan Matwin,et al.  Combining Binary Classifiers for a Multiclass Problem with Differential Privacy , 2014, Trans. Data Priv..

[24]  Diego Reforgiato Recupero,et al.  Mining Scholarly Data for Fine-Grained Knowledge Graph Construction , 2019, DL4KG@ESWC.

[25]  M. Tullu Writing the title and abstract for a research paper: Being concise, precise, and meticulous is the key , 2019, Saudi journal of anaesthesia.

[26]  Alexandra Chouldechova,et al.  A snapshot of the frontiers of fairness in machine learning , 2020, Commun. ACM.

[27]  Toniann Pitassi,et al.  Fairness through Causal Awareness: Learning Causal Latent-Variable Models for Biased Data , 2018, FAT.

[28]  Matthijs T. J. Spaan,et al.  Safe Policy Improvement with Baseline Bootstrapping in Factored Environments , 2019, AAAI.

[29]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[30]  Giulio Cimini,et al.  Investigating the interplay between fundamentals of national research systems: Performance, investments and international collaborations , 2015, J. Informetrics.

[31]  S. Kamarthi,et al.  Novel keyword co-occurrence network-based methods to foster systematic reviews of scientific literature , 2017, PloS one.