DSPACE aims to develop a DSP for space applications to be used as a stand-alone signal processor or as a building component for future MPSoC.
The new space missions, both the scientific (Earth observations, atmospheric sounding, planetary exploration) and the commercial ones (Telecom), require the capability to handle large amount of data and to process them on-board applying a first level of trigger procedures and compression algorithm. The always bigger data flow requests an high on-board numerical calculation capability in order to elaborate the acquired information before sending them to the Earth.
The on-board data processing represents one of the most critical points/issues for any spacecraft. The increasing of calculation capability and speed are requested at every system level: from the control execution to data processing, from data transfer to their storage. Particularly, applications requiring computer graphics calculation or image/video processing are high expensive in terms of required processing capabilities, memory access rate and storage size. In space applications, complicated processing algorithms are needed in several fields: earth observation and surveillance, SAR imaging, planetary observation, and emerging manned space transportation system for human-interface displays.
Available DSP based modules offer a typical computing power of 20 MIPS and noticeably 20 to 60 MFLOPS. The only European TSC21020 device currently suitable for these space applications is obsolete (40 MFLOPS sustained, 60 MFLOPS peak) and US-made alternative products are subject to export restrictions (ITAR). Although it was considered sufficient a few years ago, future missions will require much higher computational power.
This requirement, together with the strong need to reduce the dependence on critical technologies from outside Europe, makes mandatory the development of a next generation of European general-purpose high performance and radiation tolerant DSPs with a linked efficient and reliable Software application development environment. DSPACE project aims to develop an high performance DSP for space application up to 1GFLOPS that, meeting the scalability, multi-purpose and usability features, is conceived to be used both as stand-alone signal processor into embedded system and as building component in the future scientific missions for increasing the computational capabilities.
The DSPACE project is supported by funds of the European Union's Seventh Framework Programme, organized by REA-Research Agency ec.europa.eu/research/rea (FP7/207-2013) under Grant Agreement Nr 262798. Its website is available at http://www.dspace-project.eu/
Processor Design Methodology
Besides the technical requirements, there are several other obligations which need to be satisfied for a new DSP to be a suitable candidate for a next generation, intellectual property core. It should be easily adaptable and extendable to cope with the requirements of applications, standards and algorithms of upcoming missions. By raising the abstraction level at which the architecture is modeled, these extensions become easier to develop in the future. In addition, it should feature a mature, high quality software development environment mimicking other well known toolchains for a steep learning curve. This includes all necessary tools for code generation as well as a simulator of the processor core. With the more and more powerful designs of nowadays processors, the effort to develop these tools raises significantly, especially the development of a cycle-accurate simulator is a tedious and error-prone task. To cope with these requirements, our proposed DSP is not directly developed in a traditional hardware description language (e.g. in VHDL or Verilog), but using LISA (Language for Instruction Set Architectures) and its associated product Synopsys Processor Designer.
LISA is an architecture description language which allows to describe the processor's behavior as well as its structure including register files, execution units, pipeline stages and memory interface on a higher abstraction level than RTL. The tool flow can be seen in Figure 1. From the unified LISA description modeling the target architecture, a complete software development environment can be generated, including assembler, disassembler, linker and a cycle-accurate simulator. Furthermore, synthesizable VHDL or Verilog code can be generated from the same reference. This feature allows not only to avoid lots of manual and error-prone work, but ensures the inherent consistency between the generated software tools and hardware automatically. This accelerated and automated development cycle dramatically improves the time effort and maintainability of our processor core.
As space-qualified standard cell libraries are typically slower than leading-edge libraries for other application domains, technology scaling by itself does not provide a sufficient speed-up. The processor core is planned to run at a moderate speed of 125 MHz while featuring a VLIW architecture with a high degree of instruction level parallelism. The core executes eight RISC like 32-bit instructions per cycle. The clustered architecture, shown in Figure 3, consists of eight functional units grouped into two identical data paths. Each datapath has its own 32 general purpose registers with a width of 32-bits, although the routing capabilities allow the first operand of an instruction to come from either cluster of the data path. There are four functional units on each data path, namely two identical Arithmetic-Logical Units (ALU), one multiplier unit (MUL) and one Address Generation Unit (AGU). The ALU offers 92 different instructions, supporting both integer and IEEE 754 single-precision floating-point formats. The MUL features 24 different instructions including 32x32 bit multiplication, 16x16 bits multiplication as well as a single-precision floating-point multiplication. The Address Generation Unit (AGU) contains 35 instructions including a large set of possible memory addressing modes. For a high instruction throughput, the processor is designed with a single-cycle throughput and contains a 7-stage pipeline.
Gate-Level Synthesis Results
The processor's VHDL code, generated by Synopsys Processor Designer, was validated using an extensive test suite after several iterations of architecture exploration. It was synthesized with Synopsys Design Compiler with a 180nm standard-cell library for a typical use case scenario. It runs with an aggregated peak performance of 1.0 GOPS and 750 MFLOPS, which is, compared to ATMEL's TSC21020F processor, an increase of 17 and 13 times, respectively. The processor core has an area of about 380 kGates. As a final target technology, the use of ST Microelectronics's DSM 65a library is envisioned based on a new space-qualified 65nm technology. Considering the results of the synthesis using the 180nm cell library, the goal of reaching an aggregated peak performance of 1 GFLOPS with 6 floating point functional units is expected to be achieved.
As the timeframe of the DSPACE project did not allow to develop a new C compiler from scratch, a different approach was taken in this project. The flow is presented in Figure 4. The Glue Software receives assembler code produced by an existing GCC backend. It converts this input into DSPACE linear assembler. The Code Optimizer takes this input and optimizes it into parallel DSPACE assembler, performing register allocation as well as scheduling. This flow allows to offer a complete C toolchain in a limited amount of time
For testing, validation and demonstration purposes, a demonstration board is currently being designed. The board in Compact PCI 3U size will host a Xilinx Kintex7 FPGA with the DSP core, different I/O interfaces and memory. A host application will be able to directly download and execute applications on this board.
- Space Applications Services NV
- Sitael Aerospace SRL
- Intecs Informatica E Tecnologia Del Software S.P.A.
- Consorzio Pisa Ricerche Pisa
Dartmann, G., Gong, X. and Ascheid, G.: On the Pareto Optimum of Long-Term Max-Min Beamforming with General Power Constraints, in 35th IEEE Sarnoff Symposium, pp. 1 -5, IEEE, May. 2012, 10.1109/SARNOF.2012.6222720 ©2012 IEEE
Odendahl, M., Sheng, W., Aguilar, M. A., Leupers, R. and Ascheid, G.: Automated Code Generation of Streaming Applications for C6000 Multicore DSPs, in 5th European DSP Education and Research Conference, Sep. 2012 ©2012 IEEE
Khalid, A., Bagchi, D., Paul, G. and Chattopadhyay, A.: Optimized GPU Implementation and Performance Analysis of HC Series of Stream Ciphers, in 15th Annual International Conference on Information Security and Cryptology (ICISC)(Seoul, Korea), Springer, Nov. 2012, 10.1007/978-3-642-37682-5_21
Saponara, S., Fanucci, L., Donati, M., Odendahl, M., Leupers, R. and Errico, E.: Next-generation digital signal processor for European space applications, in SPIE Newsroom, Mar. 2013, 10.1117/2.1201303.004719
Hussain, W., Chen, X., Ascheid, G. and Nurmi , J.: A Reconfigurable Application-specific Instruction-set Processor for Fast Fourier Transform Processing, in 24th IEEE International Conference on Application-specific Systems, Architectures and Processors, pp. 339-345, Jun. 2013, ISSN: 2160-0511, 10.1109/ASAP.2013.6567599 ©2013 IEEE