Why rotqmbybi is Broken
8/2/07 - 12:00 PST - Posted by Mike Day, Engine Programmer
I feel that the operation carried out by the instruction rotqmbybi is broken. The instruction executes as documented, but I don’t believe it achieves what its designers originally planned. (Or, if it was designed that way, I contend that it was a bad design.)
In order to explain it’s necessary to first give a bit of background information. There are left shifts (and rotates), and there are right shifts (and rotates). On the spu, a right shift is treated as a left shift by a negative shift amount. This is an unfortunate convention for various reasons. It replaces mnemonics familiar to most assembly programmers, such as variants on ‘shr’ for shift-right, with unwieldy and not especially memorable abbreviations based on the ambiguous-sounding phrase ‘rotate and mask’. With it implied that a rotate by a positive amount is a rotate-left, then a rotate-and-mask by a negative count describes a right-shift, filling with zeros at the left (the filling with zeros corresponding to the ‘and-mask’ part of the name).
A more serious problem comes in the form of increased susceptibility to bugs. For example, to shift each word of a quadword by one bit to the right, one would use the immediate instruction rotmi (for rotate-and-mask-word-immediate), supplying an immediate shift amount of –1. The likelihood of forgetting the minus sign is very high, since programmers tend to think of the operation as a right-shift by +1, and this leads to many bugs. (At this point it’s also worth mentioning that the instruction rotma, which propogates the sign bit, is described in the manual as an ‘algebraic’ rotate-and-mask. I think this is inconsistent with the nomenclature used for the majority of architectures, where the letter ‘a’ in the name usually refers to an ‘arithmetic’ shift, contrasting it with the default kind, a ‘logical’ shift.)
It is also unfortunate that when using the variable-shift form of right-shifts, an extra instruction must be inserted just to negate the shift amount held in the register. While multi-instruction operations are often a sacrifice one must make when trading off against other properties, such as orthogonality of instruction set, or quantities like instruction word size, the point of contention here is inconsistency: an immediate shift-left in one instruction, a variable shift-left in one instruction, an immediate shift-right in one instruction, yet a variable shift-right in two instructions.
Shifts and rotates by any amount are possible within 32-bit words (or 16-bit halfwords) inside a quadword. When it comes to shifts & rotates of an entire quadword by an arbitrary amount, no single instruction can do it – instead these must be broken down into 2 operations: one which moves whole bytes around, and one which moves by a sub-byte number of bits to complete the shift. To illustrate, consider using si_ intrisics to left-shift a quadword q by an amount stored in s:
q = si_shlqbybi(q, s);
q = si_shlqbi(q, s);
If in a particular case s has the value 43, the first instruction corresponds to shifting left by 43 div 8 = 5 full bytes, and the second by 43 rem 8 = 3 extra bits. When these two instructions are combined, any left-shift amount is possible on the full 128-bit register, and this seems like a reasonable compromise. (The two instructions can be performed in either order; the net effect is the same.)
Corresponding to this pair are the instructions rotqmbybi / rotqmbi. Together these perform a right-shift of a quadword by any amount, and they maintain the convention of regarding the operation as a left-shift by a negative count. So, we might expect that if s held the value –43, the sequence
q = si_rotqmbybi(q, s);
q = si_rotqmbi(q, s);
ought to shift q to the right by 43 bits. In fact it will shift q to the right by 51 bits. Because of the way the rotqmbybi determines its byte-shift count, the first instruction here shifts not by 5 bytes but by 6. This is the respect in which I believe rotqmbybi to be ‘broken’ as alluded to at the top.
The specs for rotqmbybi show that the byte count is determined as (0 minus bits 24 to 28 of RB) modulo 32. It seems to me that the instruction would have worked more sensibly had the expression been (bits 24 to 28 of (0 minus RB)) modulo 32.
It is possible to patch up the value in s to account for this; however we must now supply two different values of s to the two component instructions. The net result is that where we would usually add in the extra instruction needed to negate the shift amount in preparation for a rotate-and-mask, we must now supply TWO extra instructions – the usual negation of the shift-amount for rotqmbi, but now also SEVEN minus the shift amount for rotqmbybi. Where variable rotate-and-mask operations are at best unwieldy and counterintuitive, in this case specific knowledge of this non-standard behaviour is also required.
[ 1 ]
|
|