SWARM: A Parallel Programming Framework for Multicore Processors, 2007 IEEE International Parallel and Distributed Processing Symposium, pp.1-8, 2007. ,
DOI : 10.1109/IPDPS.2007.370681
Provably good multicore cache performance for divide-and-conquer algorithms, SODA'08, the 19th ACM-SIAM symposium on Discrete algorithms, pp.501-510, 2008. ,
A cellular computer to implement the Kalman filter algorithm, 1969. ,
Cache-oblivious algorithms, FOCS'99, the 40th IEEE Symposium on Foundations of Computer Science, pp.285-298, 1999. ,
Communication lower bounds for distributed-memory matrix multiplication, Journal of Parallel and Distributed Computing, vol.64, issue.9, pp.1017-1026, 2004. ,
DOI : 10.1016/j.jpdc.2004.03.021
Matrix product on heterogeneous master-worker platforms, PPoPP'2008, the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.53-62, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00803487
A survey of out-of-core algorithms in numerical linear algebra, External Memory Algorithms and Visualization, pp.161-180, 1999. ,
Benchmarking GPUs to tune dense linear algebra, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2008. ,
DOI : 10.1109/SC.2008.5214359
Fine Tuning Matrix Multiplications on Multicore, High Pefroamnce Computing HiPC'08, pp.30-41, 2008. ,
DOI : 10.1007/978-3-540-85451-7_9