Branchless code sequences

wsxcde · on April 23, 2015

Scanning http://www.hackersdelight.org/aha/aha.pdf, it looks like Aha performs some form of exhaustive search over all programs? An SMT solver is probably a better way of doing this search; there's been some work in this direction.

Jha, Gulwani and colleagues have a very nice paper on oracle-guided synthesis. They show how you can generate loop-free bitvector programs using SMT: http://www.eecs.berkeley.edu/~sseshia/pubdir/synth-icse10.pd.... John Regehr has also written about this on his blog: http://blog.regehr.org/archives/1146.

Part of the complexity here in an SMT solution might be encoding all the bizarre instructions that you could possibly take advantage of in x86-like instruction sets.

TheLoneWolfling · on April 23, 2015

Never underestimate the utility of brute force & ignorance.

caf · on April 24, 2015

John Regehr's most recent post is also on an example superoptimisation - in this case one where the generated answer isn't obviously correct.

http://blog.regehr.org/archives/1219

beeworker · on April 23, 2015

Aha reminds me of STOKE: https://github.com/StanfordPL/stoke-release

murbard2 · on April 23, 2015

The advantage of brute force generation is that you can eliminate a lot of solutions extremely fast by trying random inputs.

andrewchambers · on April 24, 2015

This sort of thing was automated to entire binaries in this paper.

http://theory.stanford.edu/~aiken/publications/papers/asplos...

They managed to beat gcc -O3 - "we show speedups from 1.7 to a factor of 10 on some compute intensive kernels"

andrewchambers · on April 24, 2015

O2* , here is a cached version http://webcache.googleusercontent.com/search?q=cache:http://...

haberman · on April 23, 2015

Interesting, I'd never heard of the technique of searching all possible instruction sequences to find one with desired characteristics. It's the Infinite Monkey Theorem for code.

molyss · on April 23, 2015

Here's the original paper on super optimization by Alexia Massalin : http://courses.cs.washington.edu/courses/cse501/15sp/papers/...

I ran into it a couple years ago, after hearing from other research from her. Her papers are really impressive. I would have loved that the author of the article had mentioned her.

On a side note, I wouldn't be surprised if the JIT compilers were doing some use of this, allowing some interpreted language to have better performances than compiled C on some benchmarks

EDIT : formatting and JIT addition

kevinnk · on April 23, 2015

JIT compilers are time constrained so super optimization isn't a very good match.

chrisseaton · on April 23, 2015

Why are JIT compilers time constrained? If you have a banking system that runs all day every day, why does it matter if it takes 5 minutes to compile the hot spots at the top tier of optimisation?

oso2k · on April 23, 2015

Imagine you happen to walk up to your ATM when it's Mother Banking System is in the middle hotspot compile followed by a GC pause. What would you do if your ATM took 5 minutes+ to respond to your Withdrawal Request?

You'd probably freak out and worry about getting ATM card back.

chrisseaton · on April 24, 2015

The program doesn't have to stop to run the JIT - it can run it on a background thread.

duaneb · on April 24, 2015

I would hope an ATM isn't built to use a stop the world GC along with a stop the world jit compiler. I hope you realize how absurd you sound making an argument against jit with a real time device.

copsarebastards · on April 24, 2015

Hotspot compilations are blocking? That would surprise and horrify me.

kevinnk · on April 24, 2015

If you have a banking system that runs all day every day, why are you not compiling ahead of time and running super optimization then? JIT is most useful where the application or the target is highly dynamic and you aren't able to compile ahead of time.

chrisseaton · on April 24, 2015

If the workload characteristic changes over time you may want to recompile with different profiling data.

WalterBright · on April 24, 2015

That paper caused a small sensation in the compiler community at the time, as everyone rushed to implement it!

fit2rule · on April 24, 2015

Where is this compiler community?

jamesrom · on April 24, 2015

Is there any other reason to use a tool like this other than eliminating branch misprediction?

rdc12 · on April 24, 2015

Superoptimisers can be used on any sequance of instructions, to find a more optimial sequance. Since they approach the problem in a very different way to a compiler, it can do a better job in some cases. And it can work on hand written assembler too.

They tend to be good at finding obscure solutions.

darkmighty · on April 23, 2015

Anyone know why compilers don't already do this for small branches? Not worth the effort?

AReallyGoodName · on April 24, 2015

You miss out on branch prediction. There's a bit of a myth that branches are always bad because a branch misprediction is slower than a correctly predicted branch and therefore no branches are better.

A bit of code that eliminates branches but has dependant variables (A must be computed before B can be computed) will have pipeline stalls while the computation for A is completed. Yes it might be faster than a branch misprediction but it will never be as fast as a correctly predicted branch (which takes effectively 0 cycles).

Notice the one thing missing in this post? Benchmarks!

duaneb · on April 24, 2015

Among other things, there are branch prediction optimal algorithms that either a) only mispredict the final branch, or b) minimize amortized mispredictions.

lysium · on April 23, 2015

I'd say probably too much effort?

First, it does an exhaustive search on all possible instruction combinations.

Second, it does not _prove_ that the resulting instructions do the same as the original code. It shows it only for some random numbers and border cases.

TheLoneWolfling · on April 23, 2015

It's fairly easy to extend this to either an exhaustive search afterwards, or to run into a SMT solver to prove that it's correct / incorrect.

Though that's probably too much machinery to add on to a compiler.

Guvante · on April 23, 2015

There has been talk of using superoptimizers in larger compilers however it is usually not worth the effort unless you know a problem case in particular.

Especially since these kinds of transformations could slow down your code if the branch predictor is working well for your code set.

rdc12 · on April 24, 2015

Both GCC and Clang can compile to branchless solutions for some conditional statements. The FizzBuzz problem for example both GCC and Clang can turn that into a small lookup table

adwf · on April 23, 2015

Halting problem perhaps? Even if you put in safeguards, an exhaustive search could still be unacceptably slow. You can never tell when a pathological case might arise.

wehadfun · on April 23, 2015

what kind of a bump does this produce

chipaca · on April 24, 2015

Goosebumps.