The compare and swap CS built-in function needs the old and new values to be task/thread protected. So if you save the fullword to be changed into a stack variable thinking you have done this, if you have optimzation turned on, you will probably find the compiler eliminated the save and fetch from this stack variable, so the old and new values are taken from separate fetches from the common memory, and you can find the new value is incorrect.
I was able to defeat the compiler optimization by adding a variable defined outside the source module that has a constant +1, then subtracting a literal 1, and then adding this to the fullword. Now, the optimizer uses the stack variable.
But is there a better way?
I have lots of compare-and-swaps that worked fine from a different compiler, and I don't want to put this kludge code in many modules many times.
For example, perhaps code a function, maybe in assembler to avoid prolog/epilog overhead, that is passed the fullword to change and returns the value?