Publication: Achieving Efficient QR Factorization by Algorithm-Architecture Co-design of Householder Transformation 

Authors:
Merchant, F. ,  Vatwani, T. ,  Chattopadhyay, A. ,  Raha, S. ,  Nandy, S. K. ,  Narayan, R.
Book Title:
VLSI Design
Pages:
p.p. 98--103
Date:
2016
ISBN:
978-1-46738-700-2
DOI:
10.1109/VLSID.2016.109
Language:
English

Abstract

Householder Transformation (HT) is a prime building block of widely used numerical linear algebra primitives such as QR factorization. Despite years of intense research on HT, there exists a scope to expose higher Instruction Level Parallelism in HT through algorithmic transforms. In this paper, we propose several novel algorithmic transformations in HT to expose higher Instruction-Level Parallelism. Our propositions are backed by theoretical proofs and a series of experiments using commercial general-purpose processors. Finally, we show that algorithm-architecture co-design leads to the most efficient realization of HT. A detailed experimental study with architectural modifications is presented for a commercial CGRA. The benchmarking results with some of the recent HT implementations show 30-40% improvement in performance.

Download

BibTeX