Publication: Hardware and software design methodologies for portability, flexibility and versatility in multi-standard MIMO baseband processing 

Guenther, D.
Ph. D. Dissertation
RWTH Aachen University
Institute for Integrated Signal Processing Systems
Jul. 2017

DOI 10.18154/RWTH-2017-07440



In modern wireless communications, the amount of communication standards that have to be implemented by a communication device rises along with exponentially increasing data rates. Therefore, the software-defined radio (SDR) concept envisions a flexible, mostly programmable communication platform that can be adapted to new standards by means of software updates. The conflict between flexibility and versa- tility on the one hand, and efficiency on the other hand is a significant challenge for this approach. This is because flexibility and versatility often come at the expense of increased energy consumption and silicon area. Minimizing this trade-off is a central topic of this thesis. To this end, design paradigms for flexible programmable proces- sors and versatile non-programmable circuits, both with high efficiency, are developed and demonstrated by case studies. Another crucial aspect of SDR is ensuring software portability while maintaining high efficiency, since efficient software is often highly tailored to its target architecture. In response, this work presents concepts for the de- velopment of efficient portable baseband software, accompanied by implementation case studies. To investigate software portability, the receiver baseband signal processing of IEEE 802.11n wireless LAN and the cellular LTE standard was implemented on two com- mercial SDR architectures. The target applications were analyzed on an algorithmic level and decomposed into their computationally complex kernels (Nuclei). For these kernels, highly optimized, platform-specific implementations (Flavors) were devel- oped on both target architectures. The function interface of these Flavors on the other hand remains generic, so that the target application can be composed by calls from a platform-independent frame code that represents the control flow of the application. By doing so, a new communication standard can be implemented by adapting the frame code and potentially adding missing Flavors. Application-specific instruction set processors (ASIPs) are often used to overcome the efficiency-flexibility gap between specialized circuits and generic programmable processors. Typical baseband ASIPs commonly exhibit a high degree of complexity to compete with tailored, non-programmable circuits which leads to poor flexibility and programmability. Therefore, this work pursues an alternative concept called the lean design method. This method aims to identify the simplest architecture for a given 4 task and then make this architecture as efficient as possible. A slim and easily pro- grammable vector processor was developed as a case study to meet the requirements of multi-antenna baseband processing. To improve ease-of-use and to avoid costly numerical stabilization, the processor uses efficient floating-point arithmetic. A data path with a flexible routing and permutation network and efficient bypassing ensures high utilization of the functional units. The data path can also be adapted to the numerical requirements of the target application at runtime by masking the floating- point mantissa. The processor was layouted for a 90 nm CMOS technology to verify the promised efficiency gain. In case a flexible architecture does not provide sufficient performance for a certain application domain, the aspect of programmability often has to be given up. Lin- ear multi-antenna precoding based on singular value decomposition (SVD) for IEEE 802.11ac with up to eight transmit antennas was selected as an exemplary use case for such a situation. A versatile precoding architecture has to support the maximum use case as well as smaller antenna configurations. Therefore, the cyclic Jacobi algorithm for SVD was adapted so it can decompose bigger size matrices entirely based on 2 × 2 vector arithmetic. Additionally, a number of numerical parameters can be adapted to the requirements of the use case at hand. The resulting precoder was layouted for a 90 nm CMOS technology and benchmarked with respect to silicon area and en- ergy efficiency. Finally, the efficiency of the precoder was evaluated in the context of a MAC layer application based on IEEE 802.11ac. The resulting multi-dimensional design space includes antenna configurations, modulation schemes, etc., as well as several numerical parameters. Within this design space, the system was optimized with regard to different criteria (e.g., spectral efficiency, energy efficiency, latency). The versatility of the precoder architecture with respect to efficient support for the entire design space was instrumental to achieve the different optimization goals.