A practical algorithm for faster matrix multiplication

The purpose of this paper is to present an algorithm for matrix multiplication based on a formula discovered by Pan [7]. For matrices of order up to 10 000, the nearly optimum tuning of the algorithm results in a rather clear non-recursive one- or two-level structure with the operation count comparable to that of the Strassen algorithm [9]. The algorithm takes less workspace and has better numerical stability as compared to the Strassen algorithm, especially in Winograd's modification [2]. Moreover, its clearer and more flexible structure is potentially more suitable for efficient implementation on modern supercomputers. Copyright © 1999 John Wiley & Sons, Ltd.