Optimizing correctly-rounded reciprocal square roots for embedded VLIW cores

Claude-Pierre Jeannerod 1, 2 Guillaume Revy 1, 2
1 ARENAIRE - Computer arithmetic
Inria Grenoble - Rhône-Alpes, LIP - Laboratoire de l'Informatique du Parallélisme
Abstract : This paper presents an optimized software implementation of the reciprocal square root function, for IEEE binary32 floating-point data and with correct rounding to nearest. The main feature of this implementation is high instruction level parallelism (ILP) exposure, which results here from an extension of the polynomial evaluation-based method of~\cite{JeKnMoRe08} as well as from the design of a specific rounding procedure. This implementation proves to be very efficient for some VLIW processor cores like STMicroelectronics' ST231 (used mainly for embedded media processing), where a low latency of 29 cycles has been measured.
Type de document :
Communication dans un congrès
Asilomar Conference on Signals, Systems, and Computers, Nov 2009, United States. IEEE Signal Processing Society, 2009
Liste complète des métadonnées

Littérature citée [10 références]  Voir  Masquer  Télécharger

https://hal-ens-lyon.archives-ouvertes.fr/ensl-00391185
Contributeur : Claude-Pierre Jeannerod <>
Soumis le : mercredi 25 novembre 2009 - 11:25:32
Dernière modification le : vendredi 20 avril 2018 - 15:44:24
Document(s) archivé(s) le : samedi 26 novembre 2016 - 16:15:25

Fichier

JeannerodRevyAsilomar09-finalv...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : ensl-00391185, version 2

Collections

Citation

Claude-Pierre Jeannerod, Guillaume Revy. Optimizing correctly-rounded reciprocal square roots for embedded VLIW cores. Asilomar Conference on Signals, Systems, and Computers, Nov 2009, United States. IEEE Signal Processing Society, 2009. 〈ensl-00391185v2〉

Partager

Métriques

Consultations de la notice

194

Téléchargements de fichiers

228