Affine Vector Cache for memory bandwidth savings

Sylvain Collange 1, 2, * Alexandre Kouyoumdjian 1, 2
* Corresponding author
1 ARENAIRE - Computer arithmetic
Inria Grenoble - Rhône-Alpes, LIP - Laboratoire de l'Informatique du Parallélisme
Abstract : Preserving memory locality is a major issue in highly-multithreaded architectures such as GPUs. These architectures hide latency by maintaining a large number of threads in flight. As each thread needs to maintain a private working set, all threads collectively put tremendous pressure on on-chip memory arrays, at significant cost in area and power. We show that thread-private data in GPU-like implicit SIMD architectures can be compressed by a factor up to 16 by taking advantage of correlations between values held by different threads. We propose the Affine Vector Cache, a compressed cache design that complements the first level cache. Evaluation by simulation on the SDK and Rodinia benchmarks shows that a 32KB L1 cache assisted by a 16KB AVC presents a 59% larger usable capacity on average compared to a single 48KB L1 cache. It results in a global performance increase of 5.7% along with an energy reduction of 11% for a negligible hardware cost.
Document type :
Reports
Complete list of metadatas

Cited literature [28 references]  Display  Hide  Download

https://hal-ens-lyon.archives-ouvertes.fr/ensl-00649200
Contributor : Sylvain Collange <>
Submitted on : Wednesday, December 7, 2011 - 12:16:44 PM
Last modification on : Thursday, February 7, 2019 - 3:06:48 PM
Long-term archiving on : Friday, November 16, 2012 - 2:40:37 PM

File

affinecache_tr.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : ensl-00649200, version 1

Collections

Citation

Sylvain Collange, Alexandre Kouyoumdjian. Affine Vector Cache for memory bandwidth savings. 2011. ⟨ensl-00649200⟩

Share

Metrics

Record views

661

Files downloads

389