Videos

Conventional AI accelerators are limited by von-Neumann bottlenecks for edge workloads. Domain-specific accelerators (often neuromorphic) solve this by applying near/in-memory computing, NoC-interconnected massive-multicore setups, and data-flow computation. This requires an effective mapping of neural networks (i.e, an assignment of network layers to cores) to balance resources/memory, computation, and NoC traffic. Here, we introduce a mapping called Snake for the predominant convolutional neural networks (CNNs). It utilizes the feed-forward nature of CNNs by folding layers to spatially adjacent cores...

The enormous amount of code required to design modern hardware implementations often leads to critical vulnerabilities being overlooked. Especially vulnerabilities that compromise the confidentiality of sensitive data, such as cryptographic keys, can have a major impact on the trustworthiness of an entire system. A promising methodology to prevent such vulnerabilities is information flow analysis. Using this method one can elaborate whether information from sensitive signals flows towards outputs or untrusted components of the system. Most of these analytical strategies rely on the non-interference property, stating that the untrusted targets must not be influenced by the source’s data, which is shown to be too inflexible for many applications. ...

Executing neural network (NN) applications on general-purpose processors result in a large power and performance overhead, due to the high cost of data movement between the processor and the main memory. Neuromorphic computing systems based on memristor crossbars, perform the NN main operation i.e., vector-matrix multiplications (VMM) in an efficient way in the analog domain. Thus, they circumvent the costly energy overhead of its digital counterpart. It can be expected that neuromorphic systems will be used initially as complements to current high-performance systems rather than as a replacement....

Heterogeneous 3D/2.5D stacking allows to tightly couple components that are ideally integrated in different technologies yielding advantages in nearly all design metrics. Massively-parallel and scaleable communication architectures between the components in such a 3D ICs are commonly implemented through a Network-on-Chip (NoC). This paper contributes a systematic approach to improve the efficiency of NoCs for these heterogeneous 3D ICs. The core idea is a heterogeneous co-design of the NoC routing algorithm and router micro-architecture; the architectural and routing heterogeneity are derived from the physical implications of the different technologies. The proposed systematic approach enables a simultaneous improvement in the NoC power consumption, silicon footprint, and performance by 17 %, 45 %, and 52 %, respectively.

Published on ASP-DAC 2021.
In heterogeneous 3D System-on-Chips (SoCs), NoCs with uniform properties suffer one major limitation; the clock frequency of routers varies due to different manufacturing technologies. We propose an efficient approach to bridge the frequency gap using a heterogeneous network architecture. We show that reducing the number of VCs allows to bridge a frequency gap of up to 2x. We achieve a system-level latency improvement of up to 47% for uniform random traffic and up to 59% for PARSEC benchmarks, a maximum throughput increase of 50%, up to 68% reduced area and 38% reduced power in case studies.
 

The everlasting demand for higher computing powerfor deep neural networks (DNNs) drives the development ofparallel computing architectures. 3D integration, in which chipsare integrated and connected vertically, can further increaseperformance because it introduces another level of spatial par-allelism. Therefore, we analyze dataflows, performance, area,power and temperature of such 3D-DNN-accelerators. Monolithicand TSV-based stacked 3D-ICs are compared against 2D-ICs.We identify workload properties and architectural parametersfor efficient 3D-ICs and achieve up to 9.14x speedup of 3D vs.2D. We discuss area-performance trade-offs. We demonstrateapplicability as the 3D-IC draws similar power as 2D-ICs and isnot thermal limited.