Automated Labeling of Unknown Contracts in Ethereum

Smart contracts have recently attracted interest from diverse fields including law and finance. Ethereum in particular has grown rapidly to accommodate an entire ecosystem of contracts which run using its own crypto-currency. Smart contract developers can opt to verify their contracts so that any user can inspect and audit the code before executing the contract. However, the huge numbers of deployed smart contracts and the lack of supporting tools for the analysis of smart contracts makes it very challenging to get insights into this eco-environment, where code gets executed through transactions performing value transfer of a crypto-currency. We address this problem and report on the use of unsupervised clustering techniques and a seed set of verified contracts, in this work we propose a framework to group together similar contracts within the Ethereum network using only the contracts publicly available compiled code. We report qualitative and quantitative results on a dataset and provide the dataset and project code to the research community.