Ruby: Improving Hardware Efficiency for Tensor Algebra Accelerators Through Imperfect Factorization