Optimizing Matrix Operations on a Parallel Multiprocessor with a Memory Hierarchical System