Automated Implementation and Optimization of ASIPs
The Language for Instruction Set Architectures (LISA)
The language LISA is an Architecture Description Language (ADL) aiming at the formalized description of programmable architectures, their peripherals and interfaces. LISA supports the development of processors on various levels of abstraction above the traditional Register Transfer Level (RTL). Therefore LISA enables a stepwise refinement of the architecture during the exploration. Tools required for the exploration and development of new processor architectures can be generated automatically from a LISA description. This includes an instruction set a simulator, an assembler, a disassembler, a linker, and a C-compiler. The language and a corresponding tool suite were originally developed at the Institute for Integrated Signal Processing Systems (ISS) and are now commercialized by CoWare.
RTL Processor Synthesis
The LISA language was designed to bridge the gap between hardware and software design. It not only provides the software developer with all required retargeted software development tools, it also enables the designer to synthesize the architecture from the same specification. The output of this processor synthesis process is a description of the architecture on Register Transfer Level (RTL) using a Hardware Description Language (HDL). Both, VHDL and Verilog are supported by the RTL processor synthesis. Standard synthesis tools are used to synthesize the gate level model.
On gate level, accurate information about physical characteristics like chip area and maximal clock frequency of the architecture can be obtained. Since the required HDL model is generated automatically, the designer can use these information early and frequently during the design process in order to change and improve the architecture. Because of a successive refinement of the LISA model, the change from architecture exploration to implementation is floating. Changes are made in the LISA model rather than in the HDL model. This makes it possible to maintain a single model for both, the retargeting of the software development tools and the HDL code generation, making the tools and the implementation consistent.
As we are targeting the development of Application Specific Instruction set Processors (ASIPs), which are highly optimized for one specific application domain, the HDL code generated has to fulfill tight constraints to be an acceptable replacement for handwritten HDL code by experienced designers. Therefore, effective high-level optimization techniques have been introduced. Additionally, modern designs should not neglect modern processor features like a debug mechanism with JTAG interface or power save modes. Thus the development of the RTL processor synthesis framework also supports the automatic generation of such features.
First, the LISA model is parsed and mapped to an intermediate representation called Unified Description Layer (UDL). This mapping process also includes the integration of processor features like a debug mechanism with JTAG interface. Once the UDL is built up, the optimizations are performed. Finally, the HDL model is generated with a dedicated backend from the UDL. The complete generation process is guided by a Graphical User Interface (GUI). This enables the configuration of features as well as a graphical view of the architectures structure during the synthesis process.
High-Level Optimization Techniques
The key motivation behind embedding high-level optimization techniques in ADL-driven automatic RTL generation is the availability of information about semantical relations and mutual exclusive execution which can hardly be extracted from an RTL representation. The most important optimization techniques are shortly described in the following:
Decision Minimization: Decision Minimization utilizes semantical information in order to move condition-independent code out of the surrounding conditions. This optimizations reduces multiplexer instantiations and improves the timing significantly.
Signal Scope Localization: In LISA, it is possible to declare and use a signal resources globally in the model. During Signal Scope Localization, locality of the signal usage is explored and affected signals are converted into local resources.
Decoder Distribution: Decoder Distribution is a structural optimization, where the instruction decoder is distributed over the entire pipeline. The decoded signals from earlier stage is fed into latter stage, in case it is used more than once.
Port Sharing: During the Port Sharing optimization, the exclusiveness relations of resource accesses are considered by mapping mutually exclusive accesses to shared ports.
Resource Sharing: Resource Sharing is performed using the exclusiveness information and cost models for chip area and signal delays. Based on this information, the cost models and constraints set by the designer, the sharing algorithm selects the sets of computational resources for sharing.
Using these optimization techniques, different points of the design space can be explored and tradeoffs between physical characteristics can be made.
Generated Debug Mechanism and JTAG Interface
Besides optimization, the synthesis framework enables the automatic generation of processor features as all the required semantical information is available. One of these features is a hardware debug mechanism. It enables the designer to debug software in its final hardware environment by giving access to the state of the processor core via an additional interface. The JTAG interface is commonly used for this purpose.
On the one hand, the implementation of a debug mechanism requires new components. First of all a component is required implementing the JTAG interface. The so-called Mode-Control and the Debug-State-Machine are the main components of the debug mechanism, controlling the behavior of the processor core during debugging. They are connected to the JTAG interface via a dedicated special register, the Test-Data-Register.
On the other hand, all storage elements (e.g. registers, pipeline registers and memories) of the implementation of the processor core are affected. In order to determine the current state of the processor, these elements have to be read from the Debug-State-Machine. Providing write access makes a manipulation of the current processor state via the debug mechanism possible.
Implementing such a debug feature manually on RTL would be a lengthy and error-prone task, resulting in an implementation that provides no flexibility in case the requirements change. However, using the approach of automatically generating this processor feature, the designer is able to include necessary debugging capabilities into the target architecture already in the exploration cycle as all the necessary flexibility is provided. Furthermore, the designer is able to explore the impact of the debugging mechanism on the physical characteristics like area, timing, etc. early in the design process. The debug mechanism is highly configurable and can therefore be adapted to the requirements of the specific design, avoiding the waste of physical resources.
ASIP Design Case Studies
Several case studies in academia and industry have been used to prove the usability of our approach. Sample architectures of various classes have been implemented using the RTL processor synthesis including RISC architectures (LTRISC), VLIW architectures (LTVLIW) and DSPs (LTDSP).
The ICORE architecture has been developed for synchronization and channel estimation tasks in flexible and energy efficient digital receiver chips for terrestrial digital video broadcasting (DVB-T). This processor was initially based on a mainly conventional DSP instruction set of a typical load/store Harvard architecture. Its basic architecture has been implemented using the LISA design environment in order to tailor the architecture to the DVB-T application by adding specialized instructions.
We also developed an architecture which is assembly compatible to the Motorola M68HC11 micro controller. The goal was to reuse legacy application code for a bluetooth application, while increasing the performance of the architecture. During development of the LISA 68HC11 architecture, state-of-the-art architectural features and modern design aspects have been incorporated. The architecture is completely different from the original M68HC11 architecture. Unlike the original architecture, the new one is pipelined. Its pipeline has bypasses and is fully interlocked. Instruction and data memory were separated (Harvard architecture) and the bus-width was increased from 8 bit to 16 bit. The coding of the architecture has been rearranged. The speedup achieved by this implementation at the same clock frequency is about 60%.
The Application-Specific Multirate DSP (ASMD) from Infineon Technologies is a small ASIP dedicated to interpolation and decimation. The ASMD LISA model is derived from an existing RTL version of the ASMD developed in VHDL. The LISA model benefits from a very short design effort and design time for changes and extensions to the architecture. This enabled an easy reuse of the architecture. The evaluation of the ASMD with respect to a bluetooth application uncovered the requirement of one additional instruction to increase the throughput significantly. For that reason the LISA model was changed, the RTL model regenerated and the new models verified within one day. Due to the large reduction in design time and the negligible overhead in speed and area, the generated version of the ASMD replaced the existing core and is now used in a bluetooth device of Infineon.
David Kammler, Anupam Chattopadhyay, Oliver Schliebusch