Power Estimation at Electronic System Level
Power consumption of chips is important, especially for mobile devices. To meet the low power requirements during the design of new chip, power consumption has be regarded already during early design space exploration. Unfortunately, classic power estimation tools operating on low levels are too time-consuming to permit this. A methodology for power estimation at Electronic System Level has been developed. It allows to create power models for existing ESL models. An evaluation of the methodology for modern communication architectures has shown that the estimation error is about 10% on average. For the ARM Cortex A9 processor, the methodology resulted in an estimation error of about 5% on average. An even smaller average estimation error of less than 4% has been shown in the Blackfin 609 DSP case study.
The goal of the ESL power estimation research project is to find a methodology to create power models for existing ESL models, addressing the following major challenges:
Applicable to various kinds of components, e.g. processor cores and communication architectures
High abstraction level, i.e. not requiring information about registers and signals at RTL or lower levels
Semi-automatic power model generation requiring only little manual work
Support for source-based (white box) ESL models and black box IP components.
Output of a power trace over time in addition to the total power, in order to support determining phases of different power consumption.
Achieving a power estimation accuracy suitable for design space exploration, i.e. better than +/- 30%.
The applications in the embedded domain such as wireless communication and multimedia are nowadays very complex. To meet the high application requirements regarding latency and throughput, systems with high computational performance are demanded. This calls for multi-processor systems-on-chip (MPSoCs).
The performance of an MPSoC is largely determined by the contained processor cores and hardware accelerators, but for large-scale systems the communication architecture between the different components is also of vital importance. If it is not properly designed, it could be the bottleneck of the system. Furthermore, the power and energy consumption of all components is also important and has to be respected during the development of an MPSoC. Trade-offs have to be made between the power consumption and the performance when choosing processor cores and designing the communication architectures.
Due to the extremely slow speed the traditional power estimation techniques at register transfer level (RTL) or after place&route are not suitable anymore for performing efficient design space explorations for large systems. However, this is very important for development of competitive products and keeping the time-to-market short. Therefore, high-level power estimation on Electronic System Level (ESL) is desired, as it is able to guide the designer to efficiently draw design decisions at an early stage.
The ESL power methodology starts from an ESL system containing a model of the component for which a power model shall be created. Additionally, there has to a means for obtaining a reference power trace of this component in one scenario. Possible options for obtaining this reference power trace are power simulation on lower levels like gate or layout level and power measurements using a real hardware containing the component (Figure 1).
The methodology will then take the available power information from the reference scenario and back-annotate it to the ESL model, i.e. create an ESL power model for the component (Figure 2).
The new ESL power model is then stored in the ESL library together with the classic ESL model. It can be used afterwards to estimate the power consumption of the component in the same system running other scenarios (Figure 3) and the power consumption of the component in other systems (Figure 4). If power models for all components of a system have been created, the power consumption of the entire system can be predicted using fast ESL simulations.
Linear Power Model
Power consumption of a component depends on the processes in its internals. As the corresponding ESL model has the same functionality as the hardware or the low-level model, it contains a large part of the information about the processes.
In general, more activity in the internals of a component corresponds to a higher power consumption. Furthermore, this dependency is approximately linear. Therefore, existing source-based ESL models are extended to deliver all available information about their internals to a linear power model that computes the power estimation as a linear combination of the traced internals (Figure 5).
For black box IP models, the inner details are not visible and only the inputs and outputs are observable. In a lot of cases, this information is sufficient. If this is not the case, the details inside the black box can be simulated using an external state machine driven by the traces from inputs and outputs, a so-called power state machine.
Calibration is the method of determining the coefficients of the linear power model from a reference scenario. As shown in Figure 6, the ESL model and either real hardware or a low-level model are fed with the same input data. Thus, both the hardware (low-level model) and the ESL model will perform the same actions and their internal details will reflect the same processes. Using hardware measurements or traditional low-level power estimation approaches, a reference power trace is generated. It will match the information recorded at ESL. Both traces form the ESL power and the reference power data form the calibration data.
The linear power model used by the methodology contains an unknown factor vector a, which has to be determined before the power model can be used. The factors in this vector are calculated using the calibration data. The estimated power consumption Pest computed by the ESL power model shall match the reference power consumption Pref in the calibration scenario (Figure 7). Different methods can be used to compute the factors. A simple approach is minimizing the mean squared error. More sophisticated approaches like non-negative least-squares algorithms exist as well.
Once created, the ESL power model can be used to estimate the power consumption of the component in other scenarios by just running fast ESL simulations (see Figure 5 above). The traced information from the extended ESL model is fed into the ESL power model, which estimates the power trace over time as well as the total power consumption during the entire simulation run time.
Case Study: Communication Architectures of MPSoCs
One case study in the context of ESL power estimation has focused on communication architectures of MPSoCs. Two different communication architectures have been investigated, an AXI crossbar (Figure 8) and a network-on-chip (NoC) with a 2D mesh topology (Figure 9). Multiple synthetic traffic patterns as well as the traffic of an application from the wireless communication domain have been used for calibration and power estimation. The observed power estimation errors compared to post-layout gate-level power simulation have been about 10% on average.
Case Study: ARM Cortex A9 Processor Core
As an example for a modern application processor, the ARM Cortex A9 processor has been chosen. The ESL power estimation methodology has been applied to create a power model for the white box gem5 simulator and the black box Open Virtual Platforms processor model.
The gem5 simulator is an open source project and allows to observe and access all internal details (white box). It is already equipped with a tracing framework, which is able to trace activity in the different part of the model (e.g. fetched instructions, executed instructions per type, taken branches, register accesses, memory accesses). Those traces have been used as input to the ESL power model. The reference power curves have been obtained via hardware measurements of the ARM subsystem of the Texas Instruments OMAP4460 chip found on the PandaBoard ES (Figure 10). The power estimation error has been about 5% on average.
The ARM Cortex A9 model from Open Virtual Platforms (OVP) is an opaque binary object. Only its interfaces (instruction port, data port, interrupt ports) are documented, the rest of the model is hidden (black box). A SystemC based platform has been constructed using the OVP processor models and an in-house coherent cache subsystem. A diagram of the platform is depicted in Figure 11. The ESL traces required by the power model have been collected only at the TLM (Transaction Level Modeling) interconnects between the models. No tracing has been done inside a model. Nevertheless, the power estimation error has been below 6%.
Case Study: Blackfin 609 DSP
Investigating digital signal processors in the context of heterogeneous MPSoCs is important. Following the black box approach, the ESL power estimation methodology has been validated against the Blackfin 609 DSP. The FinBoard equipped with this processor has been served as reference for measuring the power curves. A SystemC TLM virtual platform of the FinBoard has been built using in-house models for cache, bus and memory. The processor simulator of the Blackfin has been taken from gdb-utils and encapsulated into SystemC. Figure 12 shows the block diagram of the platform. Overall, the power estimation error has been less than 4% on average.
Increasing the simulation speed requires the use of DMI (Direct Memory Interface) access and temporal decoupling. By enabling DMI in an black box environment, the TLM interconnects cannot be traced anymore. Hence, the traces have to be received at a different point. Fortunately for most processor models, like OVP, only the instruction set simulator itself is the black box. Thus, the necessary traces can be collected within the SystemC wrapper. The two case studies, ARM Cortex A9 and Blackfin 609 DSP reveal that the estimated power is still below 6% while DMI and temporal decoupling is enabled.
Zhang, D. and Ascheid, G.: BER analysis for BICM transmission over flat fading channels using finite length codewords , in Proceedings of IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), 2011, 10.1109/SPAWC.2011.5990481 ©2011 IEEE
Auras, D., Rieth, D., Leupers, R. and Ascheid, G.: VLSI Implementation of Linear MIMO Detection With Boosted Communications Performance: Extended Abstract, in Proceedings of the 24th edition of the great lakes symposium on VLSI (New York, NY, USA), pp. 71-72, ACM , May. 2014, ISBN: 978-1-45032-816-6, 10.1145/2591513.2591551
Rákossy, Z. E., Stengele, D., Ascheid, G., Leupers, R. and Chattopadhyay, A.: Exploiting Scalable CGRA Mapping of LU for Energy Efficiency using the LAYERS Architecture, in Intl. Conf. on Very Large Scale Integration (VLSI-SoC)(Daejeon, Korea), pp. 337 - 342 , IEEE, Oct. 2015, 10.1109/VLSI-SoC.2015.7314440 ©2015 IEEE
Leupers, R., Aguilar, M. A., Eusse, J. F., Castrillon, J. and Sheng, W.: MAPS: A Software Development Environment for Embedded Multicore Applications, in Handbook of Hardware/Software Codesign,Dordrecht, Sep. 2017
Fodor, G., Rajatheva, N., Zirwas, W., Thiele, L., Kurras, M., Guo, K., Tölli, A., Sorensen, J. H. and de Cavalho, E.: An Overview of Massive MIMO Technology Components in METIS, in (IEEE CommMag.), Vol. 55, No. 6, pp. 155-161, Jun. 2017, 10.1109/MCOM.2017.1600802 ©2017 IEEE
Lücken, V., Voß, N., Schreier, J., Baag, T., Gehring, M., Raschen, M., Lanius, C., Leupers, R. and Ascheid, G.: Density-Based Statistical Clustering: Enabling Sidefire Ultrasonic Traffic Sensing in Smart Cities, in Journal of Advanced Transportation, Vol. 2018, Wiley-Hindawi, Jan. 2018, 10.1155/2018/9317291