Modern compiler optimizations are a sight to behold.
When I extract the bitrev32() function from sys/dev/fdt/if_dwge.c
and compile it on its own on aarch64, clang with optimization
recognizes the purpose and reduces the arithmetic to a single "rbit"
Somewhat less (or more?) amazingly, this appears to be a late stage
optimization that is sabotaged by earlier steps.
When compiling the full if_dwge.c, clang seems to inline the function,
and then strips it down to the fragments needed to reverse the few
bits actually used.
Over in the sys/dev/rasops code, the same bit reversal is performed
by the MBE() macro. Here clang recognizes that the code is called
inside some loops and extracts the invariant parts: It loads the
eight constants into registers up front, and only leaves the
and/or/shift operations inside the loop.
So those clever optimizations of the arithmetic prevent the even
more clever substitution with "rbit".
It's just an observation I thought I'd share. Let's file it under
"the compiler moves in a mysterious way".