Mixed-precision Fused Multiply and Add - Archive ouverte HAL Access content directly
Conference Papers Year : 2012

Mixed-precision Fused Multiply and Add

(1, 2, 3, 4) , (2, 3, 4) , (1)
1
2
3
4

Abstract

The standard floating-point fused multiply and add (FMA) computes R=AB+C with a single rounding. This article investigates a variant of this operator where the addend C and the result R are of a larger format, for instance binary64 (double precision), while the multiplier inputs A and B are of a smaller format, for instance binary32 (single precision). With minor modifications, this operator is also able to perform the standard FMA in the smaller format, and the standard addition in the larger format. For sum-of-product applications, the proposed mixed-precision FMA provides the accumulation accuracy of the larger format, at a cost that is close to that of a classical FMA in the smaller format. Besides, it is fully compatible with existing arithmetic and language standards. The architectural cost of this operator is analysed in detail. An implementation of a mixed binary32/binary64 operator fully supporting subnormal numbers, binary64 addition and binary32 FMA is demonstrated and evaluated: its area overhead is one third over the classical binary32 FMA. Similarly, in high-end processors, a mixed binary64/binary128 FMA could provide an adequate solution to the binary128 requirements of very large scale computing applications.
Fichier principal
Vignette du fichier
mpfma.pdf (127.87 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

ensl-00642157 , version 1 (17-11-2011)

Identifiers

  • HAL Id : ensl-00642157 , version 1

Cite

Nicolas Brunie, Florent de Dinechin, Benoît de Dinechin. Mixed-precision Fused Multiply and Add. 45th Asilomar Conference on Signals, Systems & Computers, Nov 2011, United States. pp.165-169. ⟨ensl-00642157⟩
270 View
2574 Download

Share

Gmail Facebook Twitter LinkedIn More