Adaptive Cache Compression for High-Performance Processors, ACM SIGARCH Computer Architecture News, vol.32, issue.2, pp.212-223, 2004. ,
DOI : 10.1145/1028176.1006719
Rodinia: A benchmark suite for heterogeneous computing, 2009 IEEE International Symposium on Workload Characterization (IISWC), pp.44-54, 2009. ,
DOI : 10.1109/IISWC.2009.5306797
Barra: A Parallel Functional Simulator for GPGPU, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp.351-360, 2010. ,
DOI : 10.1109/MASCOTS.2010.43
Power Consumption of GPUs from a Software Perspective, ICCS 2009, pp.922-931, 2009. ,
DOI : 10.1007/978-3-642-01970-8_92
URL : https://hal.archives-ouvertes.fr/hal-00348672
Dynamic Detection of Uniform and Affine Vectors in GPGPU Computations, Europar 3rd Workshop on Highly Parallel Processing on a Chip (HPPC), volume LNCS 6043, pp.46-55, 2009. ,
DOI : 10.1007/978-3-642-14122-5_8
URL : https://hal.archives-ouvertes.fr/hal-00396719
Divergence Analysis and Optimizations, 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011. ,
DOI : 10.1109/PACT.2011.63
Multithreaded instruction sharing, 2010. ,
Evolution of AMD's graphics core, and preview of Graphics Core Next. AMD Fusion Developer Summit keynote, 2011. ,
Zero-content augmented caches, Proceedings of the 23rd international conference on Conference on Supercomputing, ICS '09, pp.46-55, 2009. ,
DOI : 10.1145/1542275.1542288
URL : https://hal.archives-ouvertes.fr/inria-00337742
Bandwidth compression for shader engine store operations. US Patent 7886116, assignee NVIDIA, 2011. ,
A fully bypassed six-issue integer datapath and register file on the Itanium-2 microprocessor, IEEE Journal of Solid-State Circuits, vol.37, issue.11, pp.371433-1440, 2002. ,
DOI : 10.1109/JSSC.2002.803948
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), pp.407-420, 2007. ,
DOI : 10.1109/MICRO.2007.30
Understanding throughput-oriented architectures, Communications of the ACM, vol.53, issue.11, pp.58-66, 2010. ,
DOI : 10.1145/1839676.1839694
Energy-efficient mechanisms for managing thread context in throughput processors, Proceeding of the 38th annual international symposium on Computer architecture, pp.235-246, 2011. ,
An integrated GPU power and performance model, ACM SIGARCH Computer Architecture News, vol.38, issue.3, pp.280-289, 2010. ,
DOI : 10.1145/1816038.1815998
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.332.3923
Silent stores and store value locality, IEEE Transactions on Computers, vol.50, issue.11, pp.1174-1190, 2001. ,
DOI : 10.1109/12.966493
NVIDIA Tesla: A Unified Graphics and Computing Architecture, IEEE Micro, vol.28, issue.2, pp.39-55, 2008. ,
DOI : 10.1109/MM.2008.31
Exploiting inter-thread temporal locality for chip multithreading, 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS), 2010. ,
3D finite difference computation on GPUs using CUDA, Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-2, pp.79-84, 2009. ,
DOI : 10.1145/1513895.1513905
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.472.447
The GPU Computing Era, IEEE Micro, vol.30, issue.2, pp.56-69, 2010. ,
DOI : 10.1109/MM.2010.41
Robust Energy-Efficient Adder Topologies, 18th IEEE Symposium on Computer Arithmetic (ARITH '07), pp.16-28, 2007. ,
DOI : 10.1109/ARITH.2007.31
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.6716
The performance impact of block sizes and fetch strategies, ACM SIGARCH Computer Architecture News, vol.18, issue.3, pp.160-169, 1990. ,
DOI : 10.1145/325096.325135
Larrabee, ACM Transactions on Graphics, vol.27, issue.3, pp.1-15, 2008. ,
DOI : 10.1145/1360612.1360617
Concurrent support of multiple page sizes on a skewed associative TLB, IEEE Transactions on Computers, vol.53, issue.7, pp.924-927, 2004. ,
DOI : 10.1109/TC.2004.21
A compressed frame buffer to reduce display power consumption in mobile systems, Proceedings of the 2004 Asia and South Pacific Design Automation Conference, ASP-DAC '04, pp.818-823, 2004. ,
Benchmarking GPUs to tune dense linear algebra, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2008. ,
DOI : 10.1109/SC.2008.5214359
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.218.3436
Demystifying GPU microarchitecture through microbenchmarking, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), 2010. ,
DOI : 10.1109/ISPASS.2010.5452013
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.189.5309
Frequent value locality and value-centric data cache design, pp.150-159, 2000. ,
DOI : 10.1145/378993.379235
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.5641