Research & Development

Faster SPU Clamp

Posted on Mar 14, 2008, 12:00 pm
Mike Day wrote:
 
I just had a thought…
 
Using the ‘standard’ approach it takes 4 spu instructions to clamp a floating point value (or vector of 4 values) to the range [0.0f, 1.0f]… compare against zero, select x or zero, compare against 1, select x or 1.
 
But if you don’t mind a bit of added latency (i.e. you’re inside a pipelined loop), it should be possible in 2 instructions – convert f32 to u32, with a scale of 2^32, then convert back to float using the same scale. This takes advantage of the clamping to [0,2^32-1] that the f32->u32 conversion does. (It also doesn’t require setting up a vector of zeros and a vector of 1’s).
 
I expect you wouldn’t get exactly 1 at the top of the range – it would probably be rounded down to the next float below 1.0, which is 0x3F7FFFFF.
 
Certain other clamp ranges would be possible too… [0, 2^n] using a different scale, and [–2^n, 2^n] using signed conversion.
 
Mike
 
Jonathan Garrett wrote:
 
that's great
 
inline qword ClampZeroToOne(qword q_)
{
  qword a = si_cfltu(q_, 32);
  qword b = si_cuflt(a, 32);
  return b;
}
 
inline qword ClampZeroToTwo(qword q_)
{
  qword a = si_cfltu(q_, 31);
  qword b = si_cuflt(a, 31);
  return b;
}
 
inline qword ClampZeroToFour(qword q_)
{
  qword a = si_cfltu(q_, 30);
  qword b = si_cuflt(a, 30);
  return b;
}
 
inline qword ClampMinusOneToOne(qword q_)
{
 qword a = si_cflts(q_, 31);
 qword b = si_csflt(a, 31);
 return b;
}
 
and yes, we lose a bit at the top of the range
 
Jonny
 
 
0

0 COMMENTS

SHARE
BE THE FIRST TO LIKE THIS     LIKE     COMMENT

LEAVE A COMMENT



You must be logged in to add a comment
LATEST ARTICLES