Generative Adversarial Networks for Bitcoin Data Augmentation

In Bitcoin entity classification, results are strongly conditioned by the ground-truth dataset, especially when applying supervised machine learning approaches. However, these ground-truth datasets are frequently affected by significant class imbalance as generally they contain much more information regarding legal services (Exchange, Gambling), than regarding services that may be related to illicit activities (Mixer, Service). Class imbalance increases the complexity of applying machine learning techniques and reduces the quality of classification results, especially for underrepresented, but critical classes.In this paper, we propose to address this problem by using Generative Adversarial Networks (GANs) for Bitcoin data augmentation as GANs recently have shown promising results in the domain of image classification. However, there is no “one-fits-all” GAN solution that works for every scenario. In fact, setting GAN training parameters is non-trivial and heavily affects the quality of the generated synthetic data. We therefore evaluate how GAN parameters such as the optimization function, the size of the dataset and the chosen batch size affect GAN implementation for one underrepresented entity class (Mining Pool) and demonstrate how a “good” GAN configuration can be obtained that achieves high similarity between synthetically generated and real Bitcoin address data. To the best of our knowledge, this is the first study presenting GANs as a valid tool for generating synthetic address data for data augmentation in Bitcoin entity classification.

[1]  Daniel Zeng,et al.  Targeted Addresses Identification for Bitcoin with Network Representation Learning , 2019, 2019 IEEE International Conference on Intelligence and Security Informatics (ISI).

[2]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[3]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[4]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[5]  Ying Shen,et al.  Generative adversarial fusion network for class imbalance credit scoring , 2019, Neural Computing and Applications.

[6]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[7]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[8]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Shervin Minaee,et al.  Finger-GAN: Generating Realistic Fingerprint Images Using Connectivity Imposed GAN , 2018, ArXiv.

[11]  Ian J. Goodfellow,et al.  NIPS 2016 Tutorial: Generative Adversarial Networks , 2016, ArXiv.

[12]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[13]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[14]  Jeffrey L. Gunter,et al.  Medical Image Synthesis for Data Augmentation and Anonymization using Generative Adversarial Networks , 2018, SASHIMI@MICCAI.

[15]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[16]  Yong Zhang,et al.  A Cost-Sensitive Ensemble Method for Class-Imbalanced Datasets , 2013 .

[17]  Constantine Bekas,et al.  BAGAN: Data Augmentation with Balancing GAN , 2018, ArXiv.

[18]  David Cox,et al.  Conditional Infilling GANs for Data Augmentation in Mammogram Classification , 2018, RAMBO+BIA+TIA@MICCAI.

[19]  Satoshi Nakamoto Bitcoin : A Peer-to-Peer Electronic Cash System , 2009 .

[20]  Tim Merino,et al.  Expansion of Cyber Attack Data from Unbalanced Datasets Using Generative Adversarial Networks , 2019, Software Engineering Research, Management and Applications.

[21]  Fei-Yue Wang,et al.  Generative adversarial networks: introduction and outlook , 2017, IEEE/CAA Journal of Automatica Sinica.

[22]  Shih-Wei Liao,et al.  An Evaluation of Bitcoin Address Classification based on Transaction History Summarization , 2019, 2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC).

[23]  Fernando Bação,et al.  Effective data generation for imbalanced learning using conditional generative adversarial networks , 2018, Expert Syst. Appl..

[24]  Ravikiran Vatrapu,et al.  Breaking Bad: De-Anonymising Entity Types on the Bitcoin Blockchain Using Supervised Machine Learning , 2018, HICSS.

[25]  Tālis J. Putniņš,et al.  Sex, Drugs, and Bitcoin: How Much Illegal Activity Is Financed Through Cryptocurrencies? , 2018, The Review of Financial Studies.

[26]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Hayit Greenspan,et al.  Synthetic data augmentation using GAN for improved liver lesion classification , 2018, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).

[28]  Nagiza F. Samatova,et al.  Exchange Pattern Mining in the Bitcoin Transaction Directed Hypergraph , 2017, Financial Cryptography Workshops.

[29]  Francesco Zola,et al.  Cascading Machine Learning to Attack Bitcoin Anonymity , 2019, 2019 IEEE International Conference on Blockchain (Blockchain).

[30]  J. Zico Kolter,et al.  Gradient descent GAN optimization is locally stable , 2017, NIPS.

[31]  Mikel Galar,et al.  Bitcoin and cybersecurity: temporal dissection of blockchain data to unveil changes in entity behavioral patterns , 2019 .

[32]  Yazan Boshmaf,et al.  BlockTag: Design and applications of a tagging system for blockchain analysis , 2019, SEC.

[33]  Francisco Herrera,et al.  Learning from Imbalanced Data Sets , 2018, Springer International Publishing.

[34]  Massimo Bartoletti,et al.  Data Mining for Detecting Bitcoin Ponzi Schemes , 2018, 2018 Crypto Valley Conference on Blockchain Technology (CVCBT).