Optimization for Retargetable Compilers

Motivation

Using tools like Processor Designer from Synopsys it is possible to rapidly design Application Specific Processors (ASIPs). In order to efficiently apply them in applications it is necessary to also provide programming support. This is done by using retargetable compilers which can be adapted to new processors. However, those compilers usually generate code that does not perform as well as hand written assembly. Due to the high cost of assembler programming and the low portability of the resulting programs, highly optimizing retargetable compilers are desirable.

The developments being done in this project aim to close the gap between hand-written assembly and compiled C-Code. Examples of such transformations are:

  • An optimization for Single Instruction Multiple Data (SIMD) instructions
  • An optimization to exploit the conditional execution feature found primarily in VLIW processors.
  • Retargetable linear scan register allocation

In addition to being efficient the optimizations also have to be retargetable. That means a possibility of describing the algorithms's machine dependent part is also needed.

Project overview

All the development is done using the CoSy platform by the Netherlands based company ACE. There is a close research relationship with ACE, which includes student exchanges and regular meetings. The CoSy platform has been chosen because it provides a powerful and extensible framework to build compilers. This allows us to focus on the actual optimization, while resting assured that the remainder of the compiler is up to the latest standards.

The CoSy frameworks consists of several modules called engines. The optimizations are implemented as a set of additional engines. They can then be adapted to a specific processor instance by using configuration files that describe the specifics of a certain processor.

The following paragraphs offer more detailed descriptions of selected sub-projects that are part of this project.

SIMD optimization

Many embedded processors feature a set of multimedia extensions known as SIMD. These interpret a single register as a short vector of a smaller data-type. Common setups allow the use of 32 Bit registers for two 16 Bit or for four 8 Bit values. A single arithmetic operation, like an addition or multiplication, can then be executed for all vector elements in a single step.

 

A retargetable approach to SIMD optimization has been explored and integrated into the CoSy framework. Retargetability is achieved by reusing CoSy's code selection capabilities.

Conditional execution

Many processors, especially in the VLIW class, support conditional execution: an instruction is only evaluated if a Boolean condition, which is usually stored in a register, is met. If the condition is not met, the instruction behaves like a NOP. This feature is especially important on pipelined architectures, since it can be used to reduce jump operations in the code. Since jumps incur pipeline stalls, this often results in improved runtime behavior.

 

    When used on VLIW processors the advantage is even more evident. In addition to the elimination of br/anch delays, basic blocks can be merged. This leaves more freedom to the scheduler and therefore enables more instructions to be executed in parallel. However this effect decreases with increasing size of the basic block. From a certain size onwards, there may be no noticeable effect or even an decrease in efficiency. It is therefore necessary, to decide on a case-by-case basis, whether the use of conditional instructions is profitable.

    Contact

    Felix Engel, Manuel Hohenauer, Maria Auras-Rodriguez, Sergey Yakoushkin

    Partners

    Publications

    Hohenauer, M., Engel, F., Leupers, R., Ascheid, G., Meyr, H., Bette, G. and Singh, B.: Retargetable Code Optimization for Predicated Execution, in DATE(Munich, Germany), in DATE(Munich, Germany), Mar. 2008


    Godtmann, S., Lüders, H., Ascheid, G. and Vary, P.: A Bit-Mapping Strategy for Joint Iterative Channel Estimation and Turbo-Decoding, in Proceedings of the IEEE Fall Vehicular Technology Conference (VTC)(Calgary, Canada), in Proceedings of the IEEE Fall Vehicular Technology Conference (VTC)(Calgary, Canada), Sep. 2008


    Leupers, R. and Castrillon, J.: MPSoC Programming using the MAPS Compiler, in Proceedings of the 15th Asia and South Pacific Design Automation Conference (ASP-DAC '10)(Taipei, Taiwan), pp. 897--902, Jan. 2010, ISBN: 978-1-42445-766-3 ©2010 IEEE


    Rákossy, Z. E., Hiromoto, M., Ochi, H. and Nakamura, Y.: A New Architecture Extension for Mitigation of Permanent Functional Unit Faults Using Hot-Swapping Concepts, in 15th Workshop on Synthesis And System Integration of Mixed Information technologies, Okinawa, Japan, pp. 177 - 182, 2009 ©2009 IEEE