Taming Big Wide Tables: Layout Optimization based on Column Ordering

Different OS cache policies make no significant effects on the saving of column ordering Significant Savings under different row group size settings • Column store is widely used for efficient data analytics. However, the order of columns has not received much attention because it was believed that the number of columns in a big table is small, usually less than one hundred. • Based on our investigation, the order of columns can affect much of the I/O performance especially when the table is big and wide. • Our proposed column ordering algorithm SCOA, shows up to 50% efficiency gain under real production data and workload. • Our SCOA has been implemented into Microsoft Bing log analysis pipeline. Summary {ying.yan, jeche, moscitho}@microsoft.com