IBM power5 chip: a dual-core multithreaded processor, IEEE Micro, vol.24, issue.2, pp.40-47, 2004. ,
DOI : 10.1109/MM.2004.1289290
Niagara: A 32-Way Multithreaded Sparc Processor, IEEE Micro, vol.25, issue.2, pp.21-29, 2005. ,
DOI : 10.1109/MM.2005.35
An adaptive, nonuniform cache structure for wire-delay dominated on-chip caches, ASPLOS'02 ,
DOI : 10.1145/635508.605420
URL : http://www.eecg.toronto.edu/%7Emoshovos/ACA09/reading/nuca.pdf
Managing Wire Delay in Large Chip-Multiprocessor Caches, 37th International Symposium on Microarchitecture (MICRO-37'04) ,
DOI : 10.1109/MICRO.2004.21
URL : http://www.microarch.org/micro37/papers/28_Beckmann-ManagingWire.pdf
LAWRA Linear Algebra with Recursive Algorithms, PARA '00, pp.38-51, 2001. ,
DOI : 10.1007/3-540-70734-4_7
Some efficient solutions to the affine scheduling problem. I. One-dimensional time, International Journal of Parallel Programming, vol.40, issue.6, pp.313-348, 1992. ,
DOI : 10.1145/360827.360844
Constraint-based array dependence analysis, ACM Transactions on Programming Languages and Systems, vol.20, issue.3, pp.635-678, 1998. ,
DOI : 10.1145/291889.291900
URL : http://www.cs.umd.edu/Library/TRs/CS-TR-3372/CS-TR-3372.ps.Z
Minimizing communication while preserving parallelism, Proceedings of the 10th international conference on Supercomputing , ICS '96, pp.52-60, 1996. ,
DOI : 10.1145/237578.237585
URL : http://www.cs.umd.edu/Library/TRs/CS-TR-3571/CS-TR-3571.ps.Z
Generation of efficient nested loops from polyhedra CLooG: The Chunky Loop Generator, Intl. J. of Parallel Programming, vol.2810, issue.5, pp.469-498, 2000. ,
Iterative optimization in the polyhedral model: Part II, multidimensional time, PLDI'08, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-01257273
A practical automatic polyhedral parallelizer and locality optimizer, PLDI '08, pp.101-113, 2008. ,
DOI : 10.1145/1379022.1375595
URL : http://www.cse.ohio-state.edu/~bondhugu/publications/uday-pldi08.pdf
Data and computation transformations for multiprocessors, PPOPP'95 ,
DOI : 10.1145/209936.209954
URL : http://suif.stanford.edu/papers/anderson95.ps
Data transformations for eliminating conflict misses, PLDI'98 ,
DOI : 10.1145/277650.277661
URL : http://www.cs.umd.edu/projects/cosmic/papers/pldi98.ps
Available: http://suif Simics full system simulator, phD Dissertation. [Online], 1997. ,
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset, ACM SIGARCH Computer Architecture News, vol.33, issue.4, pp.92-99, 2005. ,
DOI : 10.1145/1105734.1105747
URL : http://www.cs.wisc.edu/multifacet/papers/can05_gems.pdf
Virtual hierarchies to support server consolidation, ISCA'07 ,
DOI : 10.1145/1273440.1250670
URL : http://www.cs.berkeley.edu/~randy/Courses/CS294.F07/24.1.pdf
Accelerator: using data parallelism to program gpus for general-purpose uses, ASPLOS'06 ,
ZPL: a machine independent programming language for parallel computers, IEEE Transactions on Software Engineering, vol.26, issue.3, pp.197-211, 2000. ,
DOI : 10.1109/32.842947
Interconnect-power dissipation in a microprocessor, Proceedings of the 2004 international workshop on System level interconnect prediction , SLIP '04, 2004. ,
DOI : 10.1145/966747.966750
URL : http://www.ee.technion.ac.il/people/kolodny/ftp/magen_slip_revised.pdf
Dynamic voltage scaling with links for power optimization of interconnection networks, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings. ,
DOI : 10.1109/HPCA.2003.1183527
Reducing NoC energy consumption through compiler-directed channel voltage scaling, PLDI'06 ,
DOI : 10.1145/1133255.1134004
Profile-driven energy reduction in network-on-chips, PLDI'07 ,
DOI : 10.1145/1250734.1250779
A case for userlevel dynamic page migration, ICS'00 ,
Scheduling and page migration for multiprocessor compute servers, ASPLOS'94 ,
DOI : 10.1145/195473.195485
URL : http://www4.informatik.uni-erlangen.de/~tsthiel/Papers/Flash-schedule-asplos95.ps.gz
Optimizing data locality by array restructuring, Dept. Computer Science, 1995. ,
Non-singular data transformations: definition, validity, applications, CPC'96 ,
A linear algebra framework for automatic determination of optimal data layouts, IEEE Transactions on Parallel and Distributed Systems, vol.10, issue.2, pp.115-135, 1999. ,
DOI : 10.1109/71.752779
Static and dynamic locality optimizations using integer linear programming, IEEE Transactions on Parallel and Distributed Systems, vol.12, issue.9, pp.922-941, 2001. ,
DOI : 10.1109/TPDS.2001.1184186
URL : http://www.ece.northwestern.edu/~choudhar/publications/pdf/KanBan01A.pdf
Improving locality using loop and data transformations in an integrated framework, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture ,
DOI : 10.1109/MICRO.1998.742790
Automatic array alignment in data-parallel programs, Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages , POPL '93, pp.16-28, 1993. ,
DOI : 10.1145/158511.158517
URL : ftp://ftp.cs.unc.edu/pub/users/sc/papers/popl93.pdf
Maps, ISCA'99 ,
DOI : 10.1145/307338.300980
Custom data layout for memory parallelism, CGO'04 ,
Recursive array layouts and fast parallel matrix multiplication, Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures , SPAA '99 ,
DOI : 10.1145/305619.305645
Nonlinear array layouts for hierarchical memory systems, Proceedings of the 13th international conference on Supercomputing , ICS '99 ,
DOI : 10.1145/305138.305231
Cache-efficient matrix transposition, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550) ,
DOI : 10.1109/HPCA.2000.824350
Maximizing parallelism and minimizing synchronization with affine partitions, Parallel Computing, vol.24, issue.3-4, pp.445-475, 1998. ,
DOI : 10.1016/S0167-8191(98)00021-0
URL : http://suif.stanford.edu//papers/lim97.ps
An affine partitioning algorithm to maximize parallelism and minimize communication, Proceedings of the 13th international conference on Supercomputing , ICS '99, pp.228-237, 1999. ,
DOI : 10.1145/305138.305197