Published on ASP-DAC 2021.
In heterogeneous 3D System-on-Chips (SoCs), NoCs with uniform properties suffer one major limitation; the clock frequency of routers varies due to different manufacturing technologies. We propose an efficient approach to bridge the frequency gap using a heterogeneous network architecture. We show that reducing the number of VCs allows to bridge a frequency gap of up to 2x. We achieve a system-level latency improvement of up to 47% for uniform random traffic and up to 59% for PARSEC benchmarks, a maximum throughput increase of 50%, up to 68% reduced area and 38% reduced power in case studies.
The everlasting demand for higher computing powerfor deep neural networks (DNNs) drives the development ofparallel computing architectures. 3D integration, in which chipsare integrated and connected vertically, can further increaseperformance because it introduces another level of spatial par-allelism. Therefore, we analyze dataflows, performance, area,power and temperature of such 3D-DNN-accelerators. Monolithicand TSV-based stacked 3D-ICs are compared against 2D-ICs.We identify workload properties and architectural parametersfor efficient 3D-ICs and achieve up to 9.14x speedup of 3D vs.2D. We discuss area-performance trade-offs. We demonstrateapplicability as the 3D-IC draws similar power as 2D-ICs and isnot thermal limited.