Institute for Communication Technologies and Embedded Systems

Parallel SystemC Simulation

Motivation

Over the past years Virtual Platforms have significantly gained importance in the field of embedded system design. They allow HW/SW co-design, enable early design space exploration and decrease the Time-to-Market, engineering costs and efforts. However, increasing system complexity reduces simulation performance and threatens the viability of Virtual Platforms as design tools. To counter that ICE pushes research in this domain forward, presenting novel parallel simulation technologies based on the de-facto industry standard SystemC. Results include the synchronous simulation engine legaSCi and the time-decoupled SystemC kernel SCope.

Modern embedded system designs include complex multi-core processors, multi-level memory hierarchies and specialized hardware accelerators for radio or image processing. Each single component has to be simulated individually, causing simulation performance to drop.

Traditional techniques attempt to counter this by raising the abstraction level of the simulation models. A typical example of this is Transaction Level Modeling (TLM), where communication between components is abstracted by passing messages (transactions) using interface method calls (IMCs) instead of modeling individual signals.

An orthogonal approach to TLM is to distribute the simulation work among multiple cores on the host system. Today's multi-core workstations are equipped with multiple processing cores, making them an ideal target for parallel simulation. However, traditional simulation engines such as the OSCI SystemC kernel only use a single thread for simulation, thereby leaving a lot of computational potential unused.

Challenges

The principles of parallel discrete event simulation have been well researched now for over 30 years, but they still have not yet found a way into mainstream Electronic System Level (ESL) simulation. This is due to some key challenges that need to be addressed simultaneously:

  • Performance motivates parallel simulation in the first place. A parallel simulator should be able to scale up depending on the number of cores available on the host system.
  • Compatibility with existing simulation models must be maintained. A parallel simulator has to provide a virtual sequential environment to allow models to operate within. This is essential since most existing models are not equipped to deal with thread-safety issues.
  • Determinism is a key factor that makes virtual platforms valuable tools for software debugging. Today's software developers expect to be able to reproduce problematic states within the target software reliably in order to analyze and debug the problem.

Time-Decoupled Simulation

The SCope SystemC kernel is a custom simulation kernel and fully compatible with the IEEE SystemC standard. It deploys time-decoupling to distribute simulation load over multiple host cores. Each simulation thread has its own state, such as time, and executes its own simulation loop in a decoupled fashion: the local time-stamps of the individual threads are allowed to deviate from each other up to a constant defined as the lookahead. This decoupling allows for a much better performance, since threads need not wait for another as long as they stay within the lookahead interval. However, this increased parallel performance comes at a price: all communication between threads (e.g. TLM transactions) needs to be stated sufficiently ahead of time to ensure, that the supposed arrival time-stamp of the message has not yet elapsed in the context of the target thread. This is due to the fact that the exact time-stamp on the receiver side is not known exactly, only that it must lie within the lookahead interval. Sometimes, however, designers cannot state transactions ahead of time. SCope deals with this problem by imposing an artificial delay on the transaction before transmission. This incurs a timing error for the transaction, but allows the rest of the system to continue running in parallel.

Experimental Results

The proposed approach has been tested with the EURETILE and GEMSCLAIM simulators. Results using four threads and varying lookahead values were compared to sequential execution with the regular OSCI SystemC kernel. For EURETILE a speedup of 4x can be achieved while still maintaining deterministic execution and exact timing accuracy. The GEMSCLAIM simulator achieves a speedup of 3.1x with a 1% error in timing. In general, one can observe that a higher lookahead usually yields better performance, but reduces timing accuracy.

 

Experimental Results

The proposed approach has been tested with the EURETILE and GEMSCLAIM simulators. Results using four threads and varying lookahead values were compared to sequential execution with the regular OSCI SystemC kernel. For EURETILE a speedup of 4x can be achieved while still maintaining deterministic execution and exact timing accuracy. The GEMSCLAIM simulator achieves a speedup of 3.1x with a 1% error in timing. In general, one can observe that a higher lookahead usually yields better performance, but reduces timing accuracy.

 

 

 

Synchronous Simulation

The legaSCi simulation engine is an extension to the parSC SystemC kernel which enables synchronous parallel simulation even for legacy components written in a thread-unsafe manner. This is achieved by grouping the simulation processes of such components into containment zones. The internal SystemC scheduler then makes sure that processes that belong to the same zone are never executed concurrently. Furthermore, all incoming TLM IMCs are properly synchronized with the execution of the zone.

Simulation processes from different zones or those which do not belong to any zone are executed concurrently in a synchronous fashion: all activity that happens at the exact same point in simulated time is distributed among multiple threads on the simulation host.

Experimental Results

 

Compared to the industry standard OSCI SystemC kernel, legaSCi achieves a speedup of up to 2.1x while runnig the EURETILE full system simulator on a quad core host.

 

Publications

Schumacher, C., Weinstock, J. H., Leupers, R., Ascheid, G., Tosoratto, L., Lonardo, A., Petras, D. and Hoffmann, A.: legaSCi: Legacy SystemC Model Integration into Parallel Simulators, in (ACM TECS), Vol. 13, No. 5s, ACM, pp. 165:1--165:24, Dec. 2014, ISSN: 1539-9087, 10.1145/2678018
Weinstock, J. H., Schumacher, C., Leupers, R., Ascheid, G. and Tosoratto, L.: Time-Decoupled Parallel SystemC Simulation, in Proceedings of the Conference on Design, Automation & Test in Europe (DATE), European Design and Automation Association, 2014, 10.7873/DATE.2014.204 ©2014 IEEE