D. A. Bader, V. Kanade, and K. Madduri, SWARM: A Parallel Programming Framework for Multicore Processors, 2007 IEEE International Parallel and Distributed Processing Symposium, pp.1-8, 2007.
DOI : 10.1109/IPDPS.2007.370681

G. E. Blelloch, R. A. Chowdhury, P. B. Gibbons, V. Ramachandran, S. Chen et al., Provably good multicore cache performance for divide-and-conquer algorithms, SODA'08, the 19th ACM-SIAM symposium on Discrete algorithms, pp.501-510, 2008.

L. E. Cannon, A cellular computer to implement the Kalman filter algorithm, 1969.

M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran, Cache-oblivious algorithms, FOCS'99, the 40th IEEE Symposium on Foundations of Computer Science, pp.285-298, 1999.

D. Ironya, S. Toledo, and A. Tiskin, Communication lower bounds for distributed-memory matrix multiplication, Journal of Parallel and Distributed Computing, vol.64, issue.9, pp.1017-1026, 2004.
DOI : 10.1016/j.jpdc.2004.03.021

J. Pineau, Y. Robert, F. Vivien, and J. Dongarra, Matrix product on heterogeneous master-worker platforms, PPoPP'2008, the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.53-62, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00803487

S. Toledo, A survey of out-of-core algorithms in numerical linear algebra, External Memory Algorithms and Visualization, pp.161-180, 1999.

V. Volkov and J. W. , Benchmarking GPUs to tune dense linear algebra, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2008.
DOI : 10.1109/SC.2008.5214359

S. Zuckerman, M. Pérache, and W. Jalby, Fine Tuning Matrix Multiplications on Multicore, High Pefroamnce Computing HiPC'08, pp.30-41, 2008.
DOI : 10.1007/978-3-540-85451-7_9