An On-demand Serialization Mechanism for Trees

In the Big Data era, complex data structures are usually too big to reside in main memory. Traditional serialization mechanism can only read a tree from the disk or write a tree to the disk as a whole. When the tree gets huge, memory consumption to hold the whole tree becomes the bottleneck. To solve this problem, one need to be able to read or write only part of the tree only when necessary. We propose an on-demand serialization mechanism that can read or write tree nodes one at a time while keep the logical structure intact. The mechanism is implemented in the GeDBIT (Generalized Distance-Based Index Tree) system in C++. Empirical results demonstrate the functionality and efficiency of our mechanism.