Detector simulation is one of the most CPU intensive tasks in modern High Energy Physics. While its importance for the design of the detector and the estimation of the efficiency is ever increasing, the amount of events that can be simulated is often constrained by the available computing resources. Various kind of "fast simulations" have been developed to alleviate this problem, however, while successful, these are mostly "ad hoc" solutions which do not replace completely the need for detailed simulations. One of the common features of both detailed and fast simulation is the inability of the codes to exploit fully the parallelism which is increasingly offered by the new generations of CPUs. In the next years it is reasonable to expect an increase on one side of the needs for detector simulation, and on the other in the parallelism of the hardware, widening the gap between the needs and the available means. In the past years, and indeed since the beginning of simulation programs, several unsuccessful efforts have been made to exploit the "embarrassing parallelism" of simulation programmes. After a careful study of the problem, and based on a long experience in simulation codes, the authors have concluded that an entirely new approach has to be adopted to exploit parallelism. The paper will review the current prototyping work, encompassing both detailed and fast simulation use cases. Performance studies will be presented, together with a roadmap to develop a new full-fledged transport program efficiently exploiting parallelism for the physics and geometry computations, while adapting the steering mechanisms to accommodate detailed and fast simulation in a single framework.