ASIP Instruction Set Extension Synthesis

Motivation

Modern embedded systems require programmable and customizable processor platforms targeted for specific application domains. Mapping an application efficiently on a custom processor is not an easy process. Therefore, ASIPs are seldom designed from scratch, but rather the approach of configurable processors is adopted. This means that the hardware is such that there is a central predefined and pre-verified core (usually called base processor) and an extension fabric, where custom instructions can be placed for increasing efficiency. The process is known as ISAcustomization, or as ISEs (Instruction Set Extensions), into an ASIP's ISA. In many ASIP design or customization flows, ISEs constitute the primary (and in some cases, the only) source of hardware acceleration for a given target application. ISE are not unique to ASIP design, they can be met also in digital signal processors or network processors. However, since ASIPs are usually designed by smaller engineering groups and the availability of a compiler even for the initial architecture is questionable, there is an obvious need for a clear design flow, extensive automation and reassessed concept of compilers.

ISE Design-Flow

The proposed ISE design-flow is composed of three major parts: a frontend, ISE generation core algorithm and a backend. An advanced application C code analysis/profiling frontend is first using the LANCE compiler to generate the three-address code and as a second step, out of this code a Control Data Flow Graph (CDFG). The CDFG generation process creates a single node for the operation contained in an IR statement after recursively creating DFG nodes for all its constituent operands. Possibly, the designer can use the µ-profiler to locate the hot-sport of the application, by whose acceleration in HW the execution speed would benefit most. This portion of the code is then also represented as a graph. The generated graph is to be used in the core ISE generation algorithm. The problem of locating the custom instructions is reduced to the mathematical problem of constrained search in the data flow graph. There is a set of rules any newly selected custom instruction has to comply to. Those include data-flow constraints, like a guarantee of schedulability of the new code, or prevention of cyclic dependencies between instructions, then latency and area constraints (e.g. new instruction should fit in one clock cycle) and architectural constraints (taking into account the number of available general purpose registers, main memory access forbidden etc.). The problem of the limited number of GP registers is the most striking one, so there has been a proposal to use internal registers, visible only inside the extension fabric. They do not affect port area of the initial register file and allow a larger number of inputs and outputs to the custom instructions. The search through the graph can now be solved using an Integer Linear Program, where the indicator variable corresponds to the presence of the original instruction in the new custom one. The objective function to be maximized is the benefit of a function, computed as the speed-up it would bring to the application execution, reduced for the communication overhead that appears in case of using the internal registers. The second step of the ISE generation algorithm is another ILP for the edge assignment. It orders all the graph edges in the GP registers or internal registers, such to minimize the data coping overhead. Finally, the ISE synthesis backend receives an ISE annotated DFG and translates it into real-life processor models. It is built in two parts. The first part applies a set of generic transformations on the DFG to produce an executable sequence of base processor instructions and custom instructions, while the second part encapsulates most of the architecture specific details. Currently, the backend is capable of generating ISE descriptions for three cutting edge processor customization frameworks. These frameworks include two configurable processor based design flows (MIPS CoreXtend and ARC configurable cores) and one ADL based design flow (LISATek). Typically, three different sets of files are generated by the framework:

  1. Modified source code of the target application in ANSI C where the ISEs are inserted using calls to assembly functions.
  2. Definition of the assembly function for each ISE. The definitions of all the ISEs are generated in a separate header file which is included from the modified source code.
  3. Behavior of the ISEs in an RTL like format. This is primarily required for ISE implementation. The ISE behavior can be directly used for ISS or RTL generation, or can be hand optimized for final ASIP implementation. 

Contact

Juan Fernando Eusse, Jovana Jovic, Kingshuk Karuri

Publications

Karuri, K., Leupers, R., Ascheid, G., Meyr, H. and Kedia, M.: Design and Implementation of a Modular and Portable IEEE 754 Compliant Floating-Point Unit, in Design, Automation & Test in Europe (DATE)(Munich, Germany), in Design, Automation & Test in Europe (DATE)(Munich, Germany), Mar. 2006


Kraemer, S., Gao, L., Weinstock, J. H., Leupers, R., Ascheid, G. and Meyr, H.: HySim: A Fast Simulation Framework for Embedded Software Development, in Proceedings of the 5th Conference on Hardware/Software Codesign (CODES+ISSS '07) and System Synthesis(Salzburg, Austria), 2007