MDS code constructions with small sub-packetization and near-optimal repair bandwidth

An (n, M) vector code C ⊆ 𝔽n is a collection of M codewords where n elements (from the field 𝔽) in each of the codewords are referred to as code blocks. Assuming that 𝔽 ≅ 𝔹e, the code blocks are treated as e-length vectors over the base field 𝔹. Equivalently, the code is said to have the sub-packetization level e. This paper addresses the problem of constructing MDS vector codes which enable exact reconstruction of each code block by downloading small amount of information from the remaining code blocks. The repair bandwidth of a code measures the information flow from the remaining code blocks during the reconstruction of a single code block. This problem naturally arises in the context of distributed storage systems as the node repair problem [4]. Assuming that M = |𝔹|ke, the repair bandwidth of an MDS vector code is lower bounded by ((n − 1)/(n − k))· e symbols (over the base field 𝔹) which is also referred to as the cut-set bound [4]. For all values of n and k, the MDS vector codes that attain the cut-set bound with the sub-packetization level e = (n − k)⌈n/(n − k)⌉ are known in the literature [23,36]. This paper presents a construction for MDS vector codes which simultaneously ensures both small repair bandwidth and small sub-packetization level. The obtained codes have the smallest possible sub-packetization level e = O(n − k) for an MDS vector code and the repair bandwidth which is at most twice the cut-set bound. The paper then generalizes this code construction so that the repair bandwidth of the obtained codes approach the cut-set bound at the cost of increased sub-packetization level. The constructions presented in this paper give MDS vector codes which are linear over the base field 𝔹.