Accelerating MPSoC simulations on multi-core host architectures
Embedded multi-core platforms have gained significant importance over the past years, due to their good perspective of satisfying the intensive computation needs of modern applications, while maintaining power manageable. In order to utilize such complex architectures maximally, powerful tools would need to be provided, with a good simulator at their core. Therefore the known instruction set simulation techniques which work well for single processors need to be revised such that they can scale well for massively parallel platforms.
Parallel SystemC simulation (parSC) splits a tightly-coupled MPSoC simulation over a multi-core host, to make use of the computation power such machines provide. parSC leverages the fact that in a discrete event simulator of a largely parallel architecture a considerable amount of events happen at the same time, so the different portions of computation can be distributed on different host cores to accelerate the simulation. In such a way the accuracy at the level of the original simulation is completely maintained, and the speed-up is gained by exploiting the parallel computing power of the host. The execution model of parSC is such that the SystemC event loop is preserved, and is led by a master thread which decides if there is work to be offloaded to worker threads, in which case they just execute the inner iteration loop of the main evaluation loop of SystemC. Parallelization is therefore taking place over individual SystemC delta cycles. The execution is fully synchronous, enforcing the same global time to all SystemC processes. The implementation is optimized utilizing techniques from the high-performance computing domain, to reduce host inter-core communication latency to make fully-interlocked simulations with fast instruction set simulators feasible.
The mission of the parSC technology is to provide a fast, deterministic and accurate simulation framework to increase developer productivity. It has been designed for use-cases in mind where accuracy and/or determinism of the simulation must not be sacrificed, e.g. performance evaluation and debugging of race conditions in a simulated system. Research is ongoing regarding the adaptation of the technology to use-cases without stringent accuracy and determinism requirements.
With the parSC kernel as a base, the focus of recent research has shifted to the programming model of SystemC models to be used inside a parallel simulator. In the beginning, experimentation was conducted using models that have been coded with the constraints of a parallel execution environment in mind. In the long run, though, it will be required to have a way to reuse an existing legacy code base in paralllel simulation environments.
Currently, research is conducted at ICE regarding the following closely related issues:
- How to guarantee deterministic behavior in parallel simulations
- How to achieve thread-safety in parallel simulations
In cooperation with Synopsys, Inc., programming models are researched that fit the above requirements. At the same time, such programming models need to be simple enough so that they do not impact the model creators' producivity, and as well allow the adaptation and employment of existing legacy models in parallel simulations.
A programming model has been found that fulfills these requirements. It has been used to integrate models into a parallel version of the EU FP7 project EURETILE simulator. Inside the EURETILE platform, simple computational tiles are communicating with peers arranged in a 3D torus network using dedicated network processors. Due to this programming model, models of the network processor and peripherals could be integrated into the parallel simulator. This happened without any changes to the models, which are legacy models inherited from the predecessor EU FP6 project SHAPES. The programming model is constructed in a way that these models already fulfilled the requirements for safe and deterministic integration. The only work required was to augment the simulation structure declared during SystemC elaboration with a couple of utility classes.
The new programming model and the set of required parallel SystemC kernel extensions, together with a set of utility classes, form the legaSCi methodology.
Systems that can be partitioned into distinct parts without highly frequent communication in between, as is the case for the tiled EURETILE system, are good candidates for successful acceleration using legaSCi.