Design Automation for ASIPs
The last decade witnessed a dramatic evolution in electronic systems design, specially in the field of embedded systems. Evolution in technology has enabled system architects to devise increasingly complex devices, composed of several heterogeneous elements such as hardware accelerators, general purpose controllers and digital signal processing controllers. Application Specific Instruction Set Processors (ASIPs) aim to develop architectures that are optimized for an application of an application domain, while retaining most of the flexibility of General Purpose Processors (GPPs). The approach has proven beneficial in many cases, and there has being significant work into developing methodologies to ease ASIP design process. One of the main drawbacks of the existing methodologies is the fact that in order to design an ASIP for a given application or application domain, extensive knowledge of the inner workings of the application algorithms is required. Furthermore, several time-consuming design iterations can be required, without no guarantees to achieve an adequate performance/energy trade-off at the end of each one. Figure 1 shows the flow diagram of well established methodologies for ASIP design based Architecture Description Languages (ADLs).
The process starts with a set of application specifications an requirements, and typically with an already existing processor model. Throughout the process both the input specification and processor are modified in order to meet design constraints, and this is done iteratively until such constraints (in terms of performance, area and power/energy consumption) are satisfied. Although modern methodologies and tools help to speed up ASIP design, there are three main spects that we have identified from the state of the art, and that we believe are still missing in current ASIP design flows:
- There is a necessity to have a flow which is able to identify the intrinsic complexities of the algorithm and present them to the designer. With this information the designer would be able to devise an initial ASIP architecture that is at least close to fulfill the requirements and constraints of the application.
- A tool to estimate early in the design process which effects such architectural changes will have in the architecture performance is required to further increase productivity.
- The third and final requirement consists of a way to integrate such tools into an existing ASIP design flow, given that current ADL-based flows are in a sufficiently mature state.
The aim of the project is to propose a methodology and implement a toolset capable to fill the beforementioned gaps, and that is flexible enough to be integrated with different existent design flows. The envisioned methodology is flexible enough to target arbitrary architectures (either completely new or inherited) while not only providing feedback related to the traditional performance metrics (speed, area, power/energy), but also enabling the designer in the algorithmic exploration process, which is a critical part of ASIP design.
Figure 2 introduces a flow diagram of the proposed methodology. On one end we receive as inputs the application specification and constraints, together with a processor model either received as an inheritance (legacy models are not uncommon) or as a first architectural devise from an experienced designer. Then, the toolset should report back to the user how well this processor model fits into the application, in the form of an accurate estimation of the achievable application performance. Such estimation process should be done quickly, based on a high level processor model. This would make design iterations very fast, as it eliminates the need for actual architecture implementation, synthesis and simulation. This resulting model is going to be the entry point for and existing ADL-Based flow.
Performance Estimation for Tailored Datapaths and Instruction Sets
One of the first design steps of ASIP design consists on envisioning an instruction set (and the underlying datapath) that properly matches an application. To do this in a time efficient manner, we have proposed and implemented an engine that can accurately estimate the application performance (in clock cycles) of an envisioned architecture given by the designer. The estimation process starts with the application source code (C) and a high level processor model provided by the designer. Then, the application is compiled into a compiler intermediate representation and its costs are calculated according to the processor model. These costs are then merged with profile information in order to derive a performance figure in clock cycles. Our engine supports the following use cases, as shown in Figure 3.
- Instruction Set and Datapath Design
- Custom Hardware block integration
- Evaluation of what-if customization scenarios
We have evaluated our performance estimation engine mainly focused on two goals. The first one was to provide a comprehensive validation of the estimates by extracting cycle accurate simulation results from well-known benchmarks executed in commercially available processors, such as:
- Synopsys Processor Designer RISC (PD-RISC)
- Texas Instruments C64x, C66x and C67x DSPs
From Figure 4, we can observe that we are able to obtain estimates that are in average within 15% of the actual simulation results, while being able to provide the estimates much faster than using simulators. In average we have a gain of 248x over the PD-RISC simulators and of 67x over the TI simulators. The second approach was to use the engine during an ASIP design case study, and validate the results obtained during the estimations with those obtained from actual architectural simulation (cycle-accurate). During these process we could observe that the estimation errors were similar to those observed during the initial validation. For more detailed results and analyses, please refer to the listed publications.
Juan Fernando Eusse
Murillo, L. G., Wawroschek, S., Castrillon, J., Leupers, R. and Ascheid, G.: Automatic Detection of Concurrency Bugs through Event Ordering Constraints, in Proceedings of the Conference on Design, Automation & Test in Europe (DATE)(Dresden, Germany), Mar. 2014 ©2014 IEEE
Schürmans, S., Zhang, D., Leupers, R., Ascheid, G. and Chen, X.: Improving ESL Power Models using Switching Activity Information from Timed Functional Models, in Proceedings of the 17th International Workshop on Software and Compilers for Embedded Systems(New York, NY, USA), ACM, Jun. 2014, ISBN: 978-1-45032-941-5, 10.1145/2609248.2609250
Ishaque, A., Sakulkar, P. and Ascheid, G.: Capacity Analysis of Uplink Multi-user SC-FDMA System With Frequency-Dependent I/Q Imbalance, in 51st Annual Allerton Conference on Communication, Control, and Computing(Monticello, IL, USA), Oct. 2013, 10.1109/Allerton.2013.6736643 ©2013 IEEE