Skip to Main content Skip to Navigation
Conference papers

Mixed-precision Fused Multiply and Add

Nicolas Brunie 1, 2, 3, 4 Florent de Dinechin 2, 3, 4 Benoît de Dinechin 1
2 ARENAIRE - Computer arithmetic
Inria Grenoble - Rhône-Alpes, LIP - Laboratoire de l'Informatique du Parallélisme
3 ARIC - Arithmetic and Computing
Inria Grenoble - Rhône-Alpes, LIP - Laboratoire de l'Informatique du Parallélisme
Abstract : The standard floating-point fused multiply and add (FMA) computes R=AB+C with a single rounding. This article investigates a variant of this operator where the addend C and the result R are of a larger format, for instance binary64 (double precision), while the multiplier inputs A and B are of a smaller format, for instance binary32 (single precision). With minor modifications, this operator is also able to perform the standard FMA in the smaller format, and the standard addition in the larger format. For sum-of-product applications, the proposed mixed-precision FMA provides the accumulation accuracy of the larger format, at a cost that is close to that of a classical FMA in the smaller format. Besides, it is fully compatible with existing arithmetic and language standards. The architectural cost of this operator is analysed in detail. An implementation of a mixed binary32/binary64 operator fully supporting subnormal numbers, binary64 addition and binary32 FMA is demonstrated and evaluated: its area overhead is one third over the classical binary32 FMA. Similarly, in high-end processors, a mixed binary64/binary128 FMA could provide an adequate solution to the binary128 requirements of very large scale computing applications.
Document type :
Conference papers
Complete list of metadatas

Cited literature [21 references]  Display  Hide  Download
Contributor : Florent de Dinechin <>
Submitted on : Thursday, November 17, 2011 - 2:47:06 PM
Last modification on : Wednesday, November 20, 2019 - 3:27:40 AM
Long-term archiving on: : Friday, November 16, 2012 - 11:20:57 AM


Files produced by the author(s)


  • HAL Id : ensl-00642157, version 1



Nicolas Brunie, Florent de Dinechin, Benoît de Dinechin. Mixed-precision Fused Multiply and Add. 45th Asilomar Conference on Signals, Systems & Computers, Nov 2011, United States. pp.165-169. ⟨ensl-00642157⟩



Record views


Files downloads