R. N. Kalla, B. Sinharoy, and J. M. Tendler, IBM power5 chip: a dual-core multithreaded processor, IEEE Micro, vol.24, issue.2, pp.40-47, 2004.
DOI : 10.1109/MM.2004.1289290

P. Kongetira, K. Aingaran, and K. Olukotun, Niagara: A 32-Way Multithreaded Sparc Processor, IEEE Micro, vol.25, issue.2, pp.21-29, 2005.
DOI : 10.1109/MM.2005.35

C. Kim, D. Burger, and S. W. Keckler, An adaptive, nonuniform cache structure for wire-delay dominated on-chip caches, ASPLOS'02
DOI : 10.1145/635508.605420

URL : http://www.eecg.toronto.edu/%7Emoshovos/ACA09/reading/nuca.pdf

B. M. Beckmann and D. A. Wood, Managing Wire Delay in Large Chip-Multiprocessor Caches, 37th International Symposium on Microarchitecture (MICRO-37'04)
DOI : 10.1109/MICRO.2004.21

URL : http://www.microarch.org/micro37/papers/28_Beckmann-ManagingWire.pdf

B. S. Andersen, F. G. Gustavson, A. Karaivanov, M. Marinova, J. Waniewski et al., LAWRA Linear Algebra with Recursive Algorithms, PARA '00, pp.38-51, 2001.
DOI : 10.1007/3-540-70734-4_7

P. Feautrier, Some efficient solutions to the affine scheduling problem. I. One-dimensional time, International Journal of Parallel Programming, vol.40, issue.6, pp.313-348, 1992.
DOI : 10.1145/360827.360844

W. Pugh and D. Wonnacott, Constraint-based array dependence analysis, ACM Transactions on Programming Languages and Systems, vol.20, issue.3, pp.635-678, 1998.
DOI : 10.1145/291889.291900

URL : http://www.cs.umd.edu/Library/TRs/CS-TR-3372/CS-TR-3372.ps.Z

W. Kelly and W. Plugh, Minimizing communication while preserving parallelism, Proceedings of the 10th international conference on Supercomputing , ICS '96, pp.52-60, 1996.
DOI : 10.1145/237578.237585

URL : http://www.cs.umd.edu/Library/TRs/CS-TR-3571/CS-TR-3571.ps.Z

F. Quilleré, S. V. Rajopadhye, and D. Wilde, Generation of efficient nested loops from polyhedra CLooG: The Chunky Loop Generator, Intl. J. of Parallel Programming, vol.2810, issue.5, pp.469-498, 2000.

L. Pouchet, C. Bastoul, J. Cavazos, and A. Cohen, Iterative optimization in the polyhedral model: Part II, multidimensional time, PLDI'08, 2008.
URL : https://hal.archives-ouvertes.fr/hal-01257273

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, A practical automatic polyhedral parallelizer and locality optimizer, PLDI '08, pp.101-113, 2008.
DOI : 10.1145/1379022.1375595

URL : http://www.cse.ohio-state.edu/~bondhugu/publications/uday-pldi08.pdf

J. M. Anderson, S. P. Amarasinghe, and M. S. Lam, Data and computation transformations for multiprocessors, PPOPP'95
DOI : 10.1145/209936.209954

URL : http://suif.stanford.edu/papers/anderson95.ps

G. Rivera and C. Tseng, Data transformations for eliminating conflict misses, PLDI'98
DOI : 10.1145/277650.277661

URL : http://www.cs.umd.edu/projects/cosmic/papers/pldi98.ps

S. Amarasinghe, Available: http://suif Simics full system simulator, phD Dissertation. [Online], 1997.

M. M. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu et al., Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset, ACM SIGARCH Computer Architecture News, vol.33, issue.4, pp.92-99, 2005.
DOI : 10.1145/1105734.1105747

URL : http://www.cs.wisc.edu/multifacet/papers/can05_gems.pdf

M. R. Marty and M. D. Hill, Virtual hierarchies to support server consolidation, ISCA'07
DOI : 10.1145/1273440.1250670

URL : http://www.cs.berkeley.edu/~randy/Courses/CS294.F07/24.1.pdf

]. D. Tarditi, S. Puri, and J. Oglesby, Accelerator: using data parallelism to program gpus for general-purpose uses, ASPLOS'06

B. L. Chamberlain, S. Choi, E. C. Lewis, C. Lin, L. Snyder et al., ZPL: a machine independent programming language for parallel computers, IEEE Transactions on Software Engineering, vol.26, issue.3, pp.197-211, 2000.
DOI : 10.1109/32.842947

N. Magen, A. Kolodny, U. Weiser, and N. Shamir, Interconnect-power dissipation in a microprocessor, Proceedings of the 2004 international workshop on System level interconnect prediction , SLIP '04, 2004.
DOI : 10.1145/966747.966750

URL : http://www.ee.technion.ac.il/people/kolodny/ftp/magen_slip_revised.pdf

L. Shang, L. Peh, and N. K. Jha, Dynamic voltage scaling with links for power optimization of interconnection networks, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings.
DOI : 10.1109/HPCA.2003.1183527

G. Chen, F. Li, M. Kandemir, and M. J. Irwin, Reducing NoC energy consumption through compiler-directed channel voltage scaling, PLDI'06
DOI : 10.1145/1133255.1134004

F. Li, G. Chen, M. Kandemir, and I. Kolcu, Profile-driven energy reduction in network-on-chips, PLDI'07
DOI : 10.1145/1250734.1250779

D. S. Nikolopoulos, T. S. Papatheodorou, C. D. Polychronopoulos, J. Labarta, and E. Ayguadé, A case for userlevel dynamic page migration, ICS'00

R. Chandra, S. Devine, B. Verghese, A. Gupta, and M. Rosenblum, Scheduling and page migration for multiprocessor compute servers, ASPLOS'94
DOI : 10.1145/195473.195485

URL : http://www4.informatik.uni-erlangen.de/~tsthiel/Papers/Flash-schedule-asplos95.ps.gz

S. Leung and J. Zahorjan, Optimizing data locality by array restructuring, Dept. Computer Science, 1995.

M. F. O-'boyle and P. M. Knijnenburg, Non-singular data transformations: definition, validity, applications, CPC'96

M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam, A linear algebra framework for automatic determination of optimal data layouts, IEEE Transactions on Parallel and Distributed Systems, vol.10, issue.2, pp.115-135, 1999.
DOI : 10.1109/71.752779

M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam, and E. Ayguade, Static and dynamic locality optimizations using integer linear programming, IEEE Transactions on Parallel and Distributed Systems, vol.12, issue.9, pp.922-941, 2001.
DOI : 10.1109/TPDS.2001.1184186

URL : http://www.ece.northwestern.edu/~choudhar/publications/pdf/KanBan01A.pdf

M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, Improving locality using loop and data transformations in an integrated framework, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture
DOI : 10.1109/MICRO.1998.742790

S. Chatterjee, J. R. Gilbert, R. Schreiber, and S. Teng, Automatic array alignment in data-parallel programs, Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages , POPL '93, pp.16-28, 1993.
DOI : 10.1145/158511.158517

URL : ftp://ftp.cs.unc.edu/pub/users/sc/papers/popl93.pdf

R. Barua, W. Lee, S. Amarasinghe, and A. Agarwal, Maps, ISCA'99
DOI : 10.1145/307338.300980

B. So, M. W. Hall, and H. E. Ziegler, Custom data layout for memory parallelism, CGO'04

S. Chatterjee, A. R. Lebeck, P. K. Patnala, and M. Thottethodi, Recursive array layouts and fast parallel matrix multiplication, Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures , SPAA '99
DOI : 10.1145/305619.305645

S. Chatterjee, V. V. Jain, A. R. Lebeck, S. Mundhra, and M. Thottethodi, Nonlinear array layouts for hierarchical memory systems, Proceedings of the 13th international conference on Supercomputing , ICS '99
DOI : 10.1145/305138.305231

S. Chatterjee and S. Sen, Cache-efficient matrix transposition, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550)
DOI : 10.1109/HPCA.2000.824350

A. W. Lim and M. S. Lam, Maximizing parallelism and minimizing synchronization with affine partitions, Parallel Computing, vol.24, issue.3-4, pp.445-475, 1998.
DOI : 10.1016/S0167-8191(98)00021-0

URL : http://suif.stanford.edu//papers/lim97.ps

A. W. Lim, G. I. Cheong, and M. S. Lam, An affine partitioning algorithm to maximize parallelism and minimize communication, Proceedings of the 13th international conference on Supercomputing , ICS '99, pp.228-237, 1999.
DOI : 10.1145/305138.305197