Publication

Sie verwenden einen Browser, in dem JavaScript deaktiviert ist. Dadurch wird verhindert, dass Sie die volle Funktionalität dieser Webseite nutzen können. Zur Navigation müssen Sie daher die Sitemap nutzen.

You are currently using a browser with deactivated JavaScript. There you can't use all the features of this website. In order to navigate the site, please use the Sitemap .

Accelerating Deep Learning Inference in Constrained Embedded Devices Using Hardware Loops and a Dot Product Unit

Authors:
VREČA, J. ,  Sturm, K. J. X. ,  GUNGL, E. ,  Merchant, F. ,  BIENTINESI, P. ,  Leupers, R. ,  Brezočnik, Z.
Journal:
IEEE Access
Date:
2020
DOI:
10.1109/ACCESS.2020.3022824}
hsb:
RWTH-2020-10745
Language:
English

Abstract

Deep learning algorithms have seen success in a wide variety of applications, such as
machine translation, image and speech recognition, and self-driving cars. However, these algorithms have
only recently gained a foothold in the embedded systems domain. Most embedded systems are based
on cheap microcontrollers with limited memory capacity, and, thus, are typically seen as not capable of
running deep learning algorithms. Nevertheless, we consider that advancements in compression of neural
networks and neural network architecture, coupled with an optimized instruction set architecture, could make
microcontroller-grade processors suitable for specific low-intensity deep learning applications. We propose
a simple instruction set extension with two main components—hardware loops and dot product instructions.
To evaluate the effectiveness of the extension, we developed optimized assembly functions for the fully
connected and convolutional neural network layers. When using the extensions and the optimized assembly
functions, we achieve an average clock cycle count decrease of 73% for a small scale convolutional neural
network. On a per layer base, our optimizations decrease the clock cycle count for fully connected layers and
convolutional layers by 72% and 78%, respectively. The average energy consumption per inference decreases
by 73%. We have shown that adding just hardware loops and dot product instructions has a significant positive
effect on processor efficiency in computing neural network functions.

Download

BibTeX