A Theoretically Grounded Benchmark for Evaluating Machine Commonsense