SW Tools

Tool Suite

The LPDP application software development tool-suite includes compiler, assembler, linker, simulator as well as a graphical debugger frontend. Providing these tools, a complete software development environment is available which ranges from the C/assembly sourcecode up to simulation within a comfortable graphical debugger frontend.

The tools are an enhanced version of those tools used for architecture exploration. The enhancements concern for the software simulator the ability to profile and graphically visualize the debugging process of the application under test. The LISA debugger frontend ldb is a generic GUI for the generated LISA simulator. It visualizes the internal state of the simulation process. Both the C-source code and the disassembly of the application as well as all configured memories and (pipeline) registers are displayed. All contents can be changed in the frontend at runtime of the application. The progress of the simulator can be controlled by stepping and running through the application and setting breakpoints/ watchpoints.

Simulator

In the architecture exploration phase, there are two major requirements the software simulator has to fulfill simulation accuracy to obtain sensible profling data, and speed for fast verifcation. Since various architecture implementations will be explored, meaningful benchmarking results and profling data is only obtained from simulation with large test-vector sets. Therefore, high throughput is mandatory even in earlier stages of the architecture design. Besides simulation performance, the variety of programmable architectures that are targeted by LISA puts high requirements on the fexibility of the simulator. This flexibility is reflected in the multitude of simulation techniques that are applicable to the LISA simulator. The bandwidth of simulation techniques ranges from highly specialized application specific compiled simulation up to more general purpose interpretive simulation. The selection of the proper technique is driven by a trade-off between performance and flexibility.

Compiled ISA Simulator

Under the assumption of a constant program-memory, which is warranted for most DSP architectures, the compiled-simulation technique can be utilized. The objective of compiled simulation is to reduce simulation time by analysis of a priori knowledge about the application before simulating [3]. Due to this fact, frequently executed operations can be accelerated because instruction decoding and operation sequencing is performed in an additional step before the simulation is run. This step is executed by a tool called simulation compiler. Because of the high locality of traditional DSP code ( consider e.g. small FIR loops ) this technology usually achieves performance improvements by one to two orders of magnitude. It is obvious that the simulation compiler is a highly architecture specific tool. The development of a custom simulation compiler, the tailoring to architectural characteristics and the verification is an extremely protractive and error-prone task. Since retargeting the simulation compiler from a LISA architecture description automates this step, the use of the compiled simulation technique in the architecture exploration phase is enabled. To achieve the best possible simulation performance the simulation compiler utilizes three different scheduling principles. The selection of the proper technique { instruction based code translation, dynamic scheduling, or static scheduling -- depends on the model accuracy and the application size and type. While the former technique can be applied to instruction-set accurate models or cycle accurate models without an instruction pipeline, the latter two principles can be used for pipelined models as well.

Just-In-Time Cache Compiled ISA Simulator

Although compiled simulation has the obvious benefit of a very high simulation performance,it also becomes apparent that certain requirements are put on architecture model and application, which significantly limit flexibility in terms of applicability. The basic idea of the JIT-CCS is to memorize the information extracted in the decoding (compilation) phase of an instruction for later re-use in case of repeated execution. Although the JIT-CCS combines the benefits of both compiled and interpretive simulation, it does not achieve the performance of statically scheduled simulations. Depending on architecture model and application, static scheduling techniques might require a complex program flow analysis involving consideration of more than one instruction at a time, which is unsuitable to (semi-)interpretive simulators. Therefore, the JIT-CCS exclusively incorporates dynamic scheduling for instruction decode/compilation. Since instruction decoding is performed on instruction register contents at simulator run-time, the usage of external memories is supported and program memory changes will be honored. Furthermore, the performance of compiled simulation can be approximated due to the caching of the decoded information. The additional overhead caused by migration of the decoding step into simulator run-time has proven to result in a negligible decrease of simulation speed. Moreover, the relative impact of the decoding process on simulation speed decreases strongly with increase of the number of loop iterations within an application.

The fact that the JIT-CCS operates on instruction registers instead of program memory gives a number of additional benefits, which are usually only provided by interpretive simulators Concerning flexibility, the choice of the resource from which the instruction word is loaded into the instruction register (instruction fetch) is entirely up to the model developer. This enables the modeling of e.g. memory management units (MMU) controlling whole memory hierarchies, or out-of-order dispatchers as found in superscalar architectures.

Furthermore, JIT-CCS allows scalable simulation, since size and replacement strategy of the simulation cache can be varied. Hence, simulation performance can be traded against host memory requirements and optimized for a specific architecture or even application. Generally, an effcient balance between those parameters would be achieved by a cache size capable of storing the entire body of the application's largest loop to avoid cache pollution.

Assembler / Linker

The LISA assembler processes textual assembly source code and transforms it into linkable object code for the target architecture. The transformation is characterized by the instruction-set information defined in a LISA processor description. Besides the processor specific instruction-set, the generated assembler provides a set of pseudo-instructions (directives) to control the assembling process and initialize data. Section directives enable the grouping of assembled code into sections which can be positioned separately in the memory by the linker. Symbolic identifiers for numeric values and addresses are standard assembler features and are also supported by LISA . Especially in the domain of application specific processors a HLL compiler is often not required, since the application is very small and needs to be highly optimized, which can only be achieved by hand coding. For these architectures a C-like algebraic assembly syntax is a suitable alternative to the mnemonic based instruction formats. Beside ascii-debug listing files the assembler outputs a linkable object file in the LOF ( LISA object file) format. The linking process is controlled by a linker command file which keeps a detailed model of the target memory environment and an assignment table of the module sections to their respective target memories. Moreover, it is suitable to provide the linker with an additional memory model which is separated from the memory configuration in the LISA description and which allows linking code into external memories that are outside the architecture model.

C Compiler

The generation of a C compiler from LISA is currently a topic of research. As the figure below points out we utilize the CoSy Compiler Development System (which is a registered trademark of ACE b.v.) to generate a C compiler from a LISA processor description. CoSy follows a modular, engine based concept to perform parsing and semantic analysis of the input files in the front end, optimizations and transformations of the compiler's IR, and code generation in the compilers backend. The compiler's backend engines are generated by a tool called BEG. BEG reads so called code generator description (CGD) files and generates code selector, scheduler, and register allocator. The creation of the CGD files from the LISA processor description is based on the LISA processor compiler. This tool reads in LISA processor descriptions and generates all software tools and the hardware model. 

 

We are currently analyzing which parts of the CGD description can be automatically generated from the LISA processor description and which components require user interaction. For compiler aspects that rely on information not contained in the LISA processor model (e.g. the application binary interface (ABI)) a graphical user interface will be provided which in an intuitive way combines the information from the LISA model with the user's input.

LISA Models

Fully functional and complete LISA Models have been developed for the following architectures :

cycle based :

  • Texas Instruments TMS320C62x
  • Texas Instruments TMS320C54x
  • MIPS 32 4k
  • LEON Sparc
  • ICORE 
  • many further proprietary architectures

instruction based :

  • Analog Devices ADSP2101
  • ARM 7 
  • Motorola DSP 56002
  • MHS 80C51
  • many further proprietary architectures

The generated tools of all models were accurately verified against the vendor tools.

LISA Debugger

  • Easy choice of simulation technique (interpretive, compiled, just-in-time)
    - Typical simulation speed of 5-10 MIPS 
    - Just-in-time cache compiled simulation combines speed of
      compiled simulation with the full flexibility of interpretive simulation

  • Assembly sourcecode and disassembly view

  • LISA source code debug
    - Behavioral code of the LISA machine description
    - Full and detailed visibility of architecture behavior

  • Editable register and memory windows

  • Breakpoint and watchpoint setting
    -Conditional breakpoints

  • Application profiling
    - Instruction execution
    - Pipeline effects (stall, flush)
    - Loop recognition

  • Architecture profiling
    - Pipeline utilization
    - Functional unit utilization
    - Resource profiling (Read, Write, Access)

LISA Simulator API

  • New reworked API

  • Breakpoint and watchpoint control

  • Simulator control (step, next, run)

  • Profiling data access

  • Resource access via resource API (registers, memories)

  • Multiple instanciation of simulator class

  • Choice of simulation technique at run-time

  • Loading COFF object files

LISA Development Environment

  • Maintain and create LISA projects

  • Build and configure processor tool suite

  • Debug LISA model

LISA Instruction Set Debugger

  • Debug the model instruction set
  • LISA source-code view
  • Set breakpoints
  • Step & run mode