________________________________________ From: Mike Day Sent: Friday, March 23, 2007 3:18 PM To: Mike Acton; tech Subject: RE: curiously small code... Today I had cause to return to the question of how to convert an arbitrary (x,y) to an rsx swizzle offset, and vice versa… i.e, how to do a one-off conversion as opposed to a loop over sequential values. I did have one new idea, in the special case where we’re only dealing with spu code… To swizzle an (x,y) position you have to interleave the bits of x and y. If you could duplicate all the bits of x (e.g. 0b1101 -> 0b11110011) and similarly for y, it would be easy, because from there you could just mask them and or them together. Similarly, the inverse operation of converting a swizzled offset to an (x,y) position would be made easy if you could quickly compact a bit pattern by eliminating every other bit. (Would that be that bicimation?) Look what happens if you do fsmh (form select mask for halfwords) followed by gbb (gather bits from bytes) on the binary value 0b1101: si_fsmh(0b1101) = 0x00000000_00000000_FFFFFFFF_0000FFFF si_gbb(0x00000000_00000000_FFFFFFFF_0000FFFF) = 0b11110011 That’s bit duplication! And now look what happens if you take 0b11110011 value and apply fsmb (form select mask for bytes) followed by gbh (gather bits from halfwords): si_fsmb(0b11110011) = 0x00000000_00000000_FFFFFFFF_0000FFFF si_gbh(0x00000000_00000000_FFFFFFFF_0000FFFF) = 0b1101 That’s bicimation! Based on this idea, modified a little, here’s a couple of test functions I wrote using the Linux PS3 to swizzle and unswizzle: unsigned swizzle8(unsigned x, unsigned y) { const qword mask = (qword)(vector unsigned){0x18081909,0x1A0A1B0B,0x1C0C1D0D,0x1E0E1F0F}; const qword qx = si_fsmb(si_from_uint(x)); const qword qy = si_fsmb(si_from_uint(y)); const qword qs = si_shufb(qx, qy, mask); const qword qresult = si_gbb(qs); return si_to_uint(qresult); } vector unsigned unswizzle8(unsigned s) { const qword qs = si_fsmb(si_from_uint(s)); const qword qx = si_gbh(qs); const qword qs_rot = si_rotqbyi(qs, 15); const qword qy = si_gbh(qs_rot); return spu_insert(si_to_uint(qy), (vector unsigned)qx, 1); } These might be a bit awkwardly coded, because I’ve never used spu intrinsics, but the meat of each function (as it would appear, say, in an asm loop) is just 4 spu instructions. The main limitation is that these functions only work for upto 8-bit x and y values (hence the names swizzle8 and unswizzle8) and a 16-bit swizzled offset, but it’s easy to see that with a little extra shifting and masking, and/or shuffling, you could do versions that deal with 16-bit x and y and a 32-bit swizzle offset. Mike