XBlock-ETH: Extracting and Exploring Blockchain Data From Ethereum

Blockchain-based cryptocurrencies have received extensive attention recently. Massive data has been stored on permission-less blockchains. The analysis of massive blockchain data can bring huge business values. However, the absence of well-processed up-to-date blockchain datasets impedes big data analytics of blockchain data. To fill this gap, we collect and process the up-to-date on-chain data from Ethereum, which is one of the most popular permission-less blockchains. We name such well-processed Ethereum data as XBlock-ETH, which consists of transactions, smart contracts, and cryptocurrencies (i.e., tokens). However, it is non-trivial to partition and categorize the collected raw Ethereum data to the well-processed datasets since the whole processing procedure requires sophisticated knowledge on software engineering as well as big data analytics. Moreover, we also present basic statistics and exploration for each of the well-processed datasets. Furthermore, we also outline the possible research opportunities based on XBlock-ETH, with the data and code released online.

[1]  Lucas Layman,et al.  Toward Reducing Fault Fix Time: Understanding Developer Behavior for the Design of Automated Fault Detection Tools , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[2]  Ying Wang,et al.  An Adaptive Gas Cost Mechanism for Ethereum to Defend Against Under-Priced DoS Attacks , 2017, ISPEC.

[3]  Xiapu Luo,et al.  DataEther: Data Exploration Framework For Ethereum , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[4]  Ye Liu,et al.  ContractFuzzer: Fuzzing Smart Contracts for Vulnerability Detection , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[5]  T. Dijkstra,et al.  Recognizing cognates and interlingual homographs: Effects of code similarity in language-specific and generalized lexical decision , 2004, Memory & cognition.

[6]  Beng Chin Ooi,et al.  BLOCKBENCH: A Framework for Analyzing Private Blockchains , 2017, SIGMOD Conference.

[7]  Michael Sirivianos,et al.  Aiding the Detection of Fake Accounts in Large Scale Social Online Services , 2012, NSDI.

[8]  Filippo Menczer,et al.  The rise of social bots , 2014, Commun. ACM.

[9]  Rongxing Lu,et al.  Game Theory and Reinforcement Learning Based Secure Edge Caching in Mobile Social Networks , 2020, IEEE Transactions on Information Forensics and Security.

[10]  Mathis Steichen,et al.  The Art of The Scam: Demystifying Honeypots in Ethereum Smart Contracts , 2019, USENIX Security Symposium.

[11]  Zibin Zheng,et al.  Recommending Differentiated Code to Support Smart Contract Update , 2019, 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC).

[12]  Chao Liu,et al.  S-gram: Towards Semantic-Aware Security Auditing for Ethereum Smart Contracts , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[13]  Sabrina T. Howell,et al.  Initial Coin Offerings: Financing Growth with Cryptocurrency Token Sales , 2018, The Review of Financial Studies.

[14]  Hao Wang,et al.  Monoxide: Scale out Blockchains with Asynchronous Consensus Zones , 2019, NSDI.

[15]  Silvio Micali,et al.  Algorand: Scaling Byzantine Agreements for Cryptocurrencies , 2017, IACR Cryptol. ePrint Arch..

[16]  Eric Nielsen,et al.  Cryptocurrency Price Prediction Using News and Social Media Sentiment , 2017 .

[17]  Xiaodong Lin,et al.  Understanding Ethereum via Graph Analysis , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[18]  Zibin Zheng,et al.  Blockchain for Internet of Things: A Survey , 2019, IEEE Internet of Things Journal.

[19]  Henry M. Kim,et al.  Understanding a Revolutionary and Flawed Grand Experiment in Blockchain: The DAO Attack , 2017, J. Cases Inf. Technol..

[20]  Friedhelm Victor,et al.  Measuring Ethereum-Based ERC20 Token Networks , 2019, Financial Cryptography.

[21]  Ning Zhang,et al.  LVBS: Lightweight Vehicular Blockchain for Secure Data Sharing in Disaster Rescue , 2020, IEEE Transactions on Dependable and Secure Computing.

[22]  Zibin Zheng,et al.  Exploiting Blockchain Data to Detect Smart Ponzi Schemes on Ethereum , 2019, IEEE Access.

[23]  Satoshi Nakamoto Bitcoin : A Peer-to-Peer Electronic Cash System , 2009 .

[24]  Zibin Zheng,et al.  Detecting Ponzi Schemes on Ethereum: Towards Healthier Blockchain Technology , 2018, WWW.

[25]  Arvind Narayanan,et al.  BlockSci: Design and applications of a blockchain analysis platform , 2017, USENIX Security Symposium.

[26]  Qichao Xu,et al.  Blockchain-Based Trustworthy Edge Caching Scheme for Mobile Cyber-Physical System , 2020, IEEE Internet of Things Journal.

[27]  Gilles Roussel,et al.  Syntax tree fingerprinting for source code similarity detection , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[28]  R. Merton The Matthew effect in science. The reward and communication systems of science are considered. , 1968, Science.

[29]  Mariana Raykova,et al.  RapidChain: Scaling Blockchain via Full Sharding , 2018, CCS.

[30]  Massimo Bartoletti,et al.  Dissecting Ponzi schemes on Ethereum: identification, analysis, and impact , 2017, Future Gener. Comput. Syst..

[31]  John Nelson,et al.  Cryptocurrency Price Prediction Using Tweet Volumes and Sentiment Analysis , 2018 .

[32]  Filippo Menczer,et al.  Online Human-Bot Interactions: Detection, Estimation, and Characterization , 2017, ICWSM.

[33]  Prateek Saxena,et al.  Making Smart Contracts Smarter , 2016, IACR Cryptol. ePrint Arch..

[34]  Christian Rossow,et al.  teEther: Gnawing at Ethereum to Automatically Exploit Smart Contracts , 2018, USENIX Security Symposium.

[35]  Sang Hoon Kang,et al.  Structural breaks and double long memory of cryptocurrency prices: A comparative analysis from Bitcoin and Ethereum , 2019, Finance Research Letters.

[36]  Zibin Zheng,et al.  Blockchain challenges and opportunities: a survey , 2018, Int. J. Web Grid Serv..

[37]  Zibin Zheng,et al.  A Detailed and Real-Time Performance Monitoring Framework for Blockchain Systems , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[38]  E.J. Weyuker,et al.  Using Developer Information as a Factor for Fault Prediction , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[39]  Sencun Zhu,et al.  Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection , 2014, SIGSOFT FSE.

[40]  Philipp Jovanovic,et al.  OmniLedger: A Secure, Scale-Out, Decentralized Ledger via Sharding , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[41]  Qin Yang,et al.  Measurement and Analysis of the Bitcoin Networks: A View from Mining Pools , 2019, 2020 6th International Conference on Big Data Computing and Communications (BIGCOM).

[42]  Emin Gün Sirer,et al.  Decentralization in Bitcoin and Ethereum Networks , 2018, Financial Cryptography.

[43]  Yang Li,et al.  EtherQL: A Query Layer for Blockchain System , 2017, DASFAA.

[44]  Shui Yu,et al.  APIS: Privacy-Preserving Incentive for Sensing Task Allocation in Cloud and Edge-Cooperation Mobile Internet of Things With SDN , 2020, IEEE Internet of Things Journal.

[45]  Laurie A. Williams,et al.  Predicting failures with developer networks and social network analysis , 2008, SIGSOFT '08/FSE-16.

[46]  Yaniv Altshuler,et al.  Network Analysis of ERC20 Tokens Trading on Ethereum Blockchain , 2018 .

[47]  Xiapu Luo,et al.  TokenScope: Automatically Detecting Inconsistent Behaviors of Cryptocurrency Tokens in Ethereum , 2019, CCS.

[48]  Jordan Tigani,et al.  Google BigQuery Analytics , 2014 .

[49]  Sukrit Kalra,et al.  ZEUS: Analyzing Safety of Smart Contracts , 2018, NDSS.