Programming
c++ c assembly x86 micro-optimization
Updated Thu, 11 Aug 2022 10:46:11 GMT

Any possible code that can flip a bit/integer/bool between 0 and 1 in single CPU instruction


Can a single x86 instruction toggle a boolean value between '0' and '1'?

I thought of following ways but all result in two instruction with -O3 flag of gcc.

status =! status;
status = 1 - status;
status  = status == 0 ? 1: 0;
int flip[2] = {1, 0};
status = flip[status];

Is there a faster way to do this?

This is what I tried: https://godbolt.org/g/A3qNUw


What I need is a function which toggles the input and returns, written in a way that compiles to one instruction. Something similar to this function:

int addOne(int n) { return n+1; }

compiles on Godbolt to this:

  lea eax, [rdi+1]    # return n+1 in a single instruction
  ret



Solution

To flip a bit in an integer, use xor like this: foo ^= 1.

gcc knows this optimization already for bool, so you can return !status; like a normal person without losing any efficiency. gcc does compile status ^= 1 to an xor instruction as well. In fact, all your ideas except the table lookup compile to a single xor instruction with bool input / return value.

Check it out on the Godbolt compiler explorer with gcc -O3, with asm output panes for bool and int.

MYTYPE func4(MYTYPE status) {
    status ^=1;
    return status;
}
  # same code for bool or int
  mov eax, edi
  xor eax, 1
  ret

vs.

MYTYPE func1(MYTYPE status) {
    status = !status;
    return status;
}
  # with -DMYTYPE=bool
  mov eax, edi
  xor eax, 1
  ret
  # with int
  xor eax, eax
  test edi, edi
  sete al
  ret

Why is bool different from int?

The x86-64 System V ABI requires that callers passing a bool pass a 0 or 1 value, not just any non-zero integer. Thus, the compiler can assume that about the input.

But with int foo, the C expression !foo requires "booleanizing" the value. !foo has type _Bool / (aka bool if you #include <stdbool.h>), and converting that back to an integer must produce a value of 0 or 1. If the compiler doesn't know that foo must be 0 or 1, it can't optimize !foo to foo^=1, and can't realize that foo ^= 1 flips a value between truthy / falsy. (In the sense that if(foo) means if(foo != 0) in C).

This is why you get test/setcc (zero-extended into a 32-bit int by xor-zeroing a register before the test).

Related: Boolean values as 8 bit in compilers. Are operations on them inefficient?. Stuff like (bool1 && bool2) ? x : y isn't always compiled as efficiently as you might hope. Compilers are pretty good, but do have missed-optimization bugs.


What about that extra mov instruction?

It will go away when inlining, if the compiler doesn't need / want to keep the old un-flipped value around for later. But in a stand-alone function, the first arg is in edi, and the return value needs to be in eax (in the x86-64 System V calling convention).

Tiny functions like this are a close approximation to what you might get as part of a large function (if this flip couldn't be optimized into something else), but needing the result in a different register is a confounding factor.


x86 doesn't have a copy-and-xor integer instruction, so for a stand-alone function it will take at least a mov to copy from the arg-passing register to eax.

lea is special: it's one of the few integer ALU instructions that can write the result to a different register instead of destroying its input. lea is a copy-and-shift/add instruction, but there's no copy-and-xor instruction in x86. Many RISC instruction sets have 3-operand instructions, for example MIPS could do xor $t1, $t2, $t3.

AVX introduced non-destructive versions of vector instructions (saving a lot of movdqa / movups register-copying in a lot of code), but for integer there are only a few new instructions that do different things. rorx eax, ecx, 16 for example does eax = rotate_right(ecx, 16), and uses the same VEX encoding that non-destructive AVX instructions use.





Comments (1)

  • +0 – This is a super interesting answer — Mar 01, 2018 at 21:50