Faster SPU Clamp
03/14/08 - 18:00 PST - Posted by Mike Day, Senior Engine Programmer, and Jonathan Garrett, Senior Engine Programmer
---- Mike Day Wrote ----
I just had a thought…
Using the ‘standard’ approach it takes 4 spu instructions to clamp a floating point value (or vector of 4 values) to the range [0.0f, 1.0f]… compare against zero, select x or zero, compare against 1, select x or 1.
But if you don’t mind a bit of added latency (i.e. you’re inside a pipelined loop), it should be possible in 2 instructions – convert f32 to u32, with a scale of 2^32, then convert back to float using the same scale. This takes advantage of the clamping to [0,2^32-1] that the f32->u32 conversion does. (It also doesn’t require setting up a vector of zeros and a vector of 1’s).
I expect you wouldn’t get exactly 1 at the top of the range – it would probably be rounded down to the next float below 1.0, which is 0x3F7FFFFF.
Certain other clamp ranges would be possible too… [0, 2^n] using a different scale, and [–2^n, 2^n] using signed conversion.
Mike
---- Jonathan Garrett Wrote ----
that's great
inline qword ClampZeroToOne(qword q_)
{
qword a = si_cfltu(q_, 32);
qword b = si_cuflt(a, 32);
return b;
}
inline qword ClampZeroToTwo(qword q_)
{
qword a = si_cfltu(q_, 31);
qword b = si_cuflt(a, 31);
return b;
}
inline qword ClampZeroToFour(qword q_)
{
qword a = si_cfltu(q_, 30);
qword b = si_cuflt(a, 30);
return b;
}
inline qword ClampMinusOneToOne(qword q_)
{
qword a = si_cflts(q_, 31);
qword b = si_csflt(a, 31);
return b;
}
and yes, we lose a bit at the top of the range
Jonny
|
|