| Article Id: |
 |
ensl-00391185, version 2 |
 |
 |
| Domaine: |
 |
Computer Science/Computer Arithmetic
|
 |
 |
| Titre: |
 |
Optimizing correctly-rounded reciprocal square roots for embedded VLIW cores |
 |
 |
| Auteur(s): |
 |
Claude-Pierre Jeannerod1, 2, Guillaume Revy1, 2 |
 |
 |
| Laboratoire: |
 |
| 1: |
Inria Grenoble Rhône-Alpes / LIP Laboratoire de l'Informatique du Parallélisme - ARENAIRE |
 |
| 2: |
LIP - Laboratoire de l'Informatique du Parallélisme |
|
 |
 |
| Équipe de recherche: |
 |
[ARENAIRE - Arithmétique des ordinateurs] |
| Résumé: |
 |
This paper presents an optimized software implementation of the reciprocal square root function, for IEEE binary32 floating-point data and with correct rounding to nearest. The main feature of this implementation is high instruction level parallelism (ILP) exposure, which results here from an extension of the polynomial evaluation-based method of~\cite{JeKnMoRe08} as well as from the design of a specific rounding procedure. This implementation proves to be very efficient for some VLIW processor cores like STMicroelectronics' ST231 (used mainly for embedded media processing), where a low latency of 29 cycles has been measured. |
 |
 |
 |
Langue du texte intégral: |
 |
English |
 |
 |
| Mots-clés: |
 |
Reciprocal square root – Binary floating-point arithmetic – Correct rounding – Polynomial evaluation – Software implementation – VLIW processor core |
 |
 |
| Classification: |
 |
B2.4, G.1.9, G.4 |
 |
 |
| Référence interne: |
 |
LIP research report RR2009-21 |
 |
 |