We have developed a scalable hierarchical parallelization framework for molecular dynamics (MD) simulation on emerging multicore clusters. The framework combines: (1) inter-node level parallelism by spatial decomposition using message passing; (2) intra-node (inter-core) level parallelism through a master/worker paradigm and cellular decomposition using critical section-free multithreading; and (3) intra-core level parallelism via single-instruction multiple-data (SIMD) techniques. Our multithreading scheme takes account of cache coherency to maximize performance. For data-level parallelism via SIMD, zero padding is used to solve the alignment issue for complex data type as array, and simple data-type reformatting is used to solve the alignment issue for data with irregular memory accessing. By combining a hierarchy of parallelism, the framework exposes maximal concurrency and data locality, thereby achieving: (1) inter-node weak-scaling parallel efficiency 0.975 on 32,768 BlueGene/P nodes and 0.985 on 106,496 BlueGene/L nodes; (2) inter-node strong-scaling parallel efficiency 0.90 on 32 dual quadcore AMD Opteron nodes and 0.94 on 32 dual quadcore Intel Xeon nodes; (3) inter-core multithread parallel efficiency 0.65 for the whole program (0.89 for two-body force calculation) for eight threads on a dual quadcore Xeon platform; and (4) SIMD speedup 1.35 for the whole program (1.42 for the twobody force calculation).