An Introduction to Lock-Free Programming (2012)

unwind · on Feb 17, 2014

I ran into this problem last year, that ordinary mutexes were not working out due to various reasons.

Since I do embedded programming, I looked through the ARM manuals and documentation, to see what they have on offer.

I ran into the lovely LDREX and STREX instructions, which are just awesome.

It was pretty trivial to rewrite e.g. a custom allocator that used to grab a mutex, poke its internal freelists, and release the mutex so that it instead does the poking with LDREX/STREX, thus obtaining thread-safety while doing the actual work it wanted to do. Lock-free programming can be beautiful like that.

See http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.... for the instruction reference on LDREX/STREX.

Patient0 · on Feb 17, 2014

Oh man, now that lock free programming is such a hot topic I am dreading the code about to be written by people "teaching themselves" about it!

"This algorithm is much faster now that I've used some lock free programming techniques I just read about!!!"

Code that works correctly on x86 but fails intermittently on another processor. It's like a nightmare unfolding...

kabdib · on Feb 17, 2014

Code that works correctly on that particular programmer's x86 machine, you mean. "Works great on my PC at home and at work" does not necessary imply "Works great on the 64 CPU production machine in the datacenter".

This stuff is terrifying.

preshing · on Feb 17, 2014

All the more reason to do our best to organize and share correct information about it.

Patient0 · on Feb 17, 2014

Yes and at least the article makes it pretty clear just how complicated and subtle the issues are.

brooksbp · on Feb 18, 2014

Shouldn't atomics behave similarly across all platforms? I am thinking of the GCC builtin atomics [1].

What sort of code are you thinking about? Even lower-level / non-abstracted atomics and memory fences that are specific to a particular architecture? I actually did run into code like this for a MIPS-based CPU and we converted the code to using the GCC builtins for portability. Funny thing is that the next hardware architecture toolchain wasn't GCC-based, so there was no benefit... we had to make wrappers for the atomics... and hunt down some absurd bugs related to L2 cache writebacks and mem fences. It was an exercise in how long I could stick with a problem... some serious hunting and staring at debug output / hex dumps..

[1] http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins....

fit2rule · on Feb 17, 2014

I always found this paper to be quite a good introduction to lock-free techniques:

http://www.grame.fr/ressources/publications/LockFree.pdf

In the music world, such issues have been solved repeatedly over the years - seems like a little cross-polination would be a good thing ..

taspeotis · on Feb 17, 2014

Raymond Chen has a pretty practical guide to lock-free algorithms on his blog.

http://social.msdn.microsoft.com/Search/en-US?query=Lock-fre...

pron · on Feb 17, 2014

This is perhaps the place to mention that Java 8 has memory-fence intrinsics (loadFence, storeFence, fullFence in Unsafe)

dschiptsov · on Feb 17, 2014

Does side-effect-free programming with persistent data-structures on share-nothing, actor-based architecture, in other words Erlang is qualified to be lock-free?

It seems that Haskell also doesn't exist.)

lostcolony · on Feb 17, 2014

While responses have well addressed the deadlock bit of this, and someone else addressed the side effect free bit, I wantedI wanted to actually make sure you saw why actors necessarily involve side effects.

  %Here is a function that rather idiotically takes a process 
  %id, a message, and just sends the message to the process,
  %in Erlang, and then returns the atom 'ok'.

  send_message(Pid, Message) ->
    Pid ! Message,
    ok.

If this was side effect free, that first line of the function, Pid ! Message, could safely be removed by the compiler, because it either returns nothing, or we are doing nothing with the return; either way, if it has no side effects it is not doing anything. Much like if I were to replace it with something that obviously does not cause side effects, such as 5 + 7; we're not doing anything with the 12 that that returns, so why bother calculate it?

But obviously that is not the case; we are expecting the process that Pid corresponds to to receive that message, and to do something with it.

The very act of sending a message is a side effect.

dschiptsov · on Feb 17, 2014

Come on, when people are talking about a code without side effects they meant so called pure functions, which always produce the same output for the same input. Message passing is a way of communication which is about "causing effects" so let's not take it to absurd.

On the other hand, non-networking procedures in Erlang are pure functions, so they could be safely evaluated in parallel. Its spawned processes are isolated, they share nothing, not even a common stack, so they could fail and being restarted without affecting any other code or data.

It is much better to read Armstrong's thesis where he gives outline of the design of Erlang, hows and whys. It might has a bit ugly syntax, but semantics are fine.

lostcolony · on Feb 19, 2014

My post wasn't a criticism. Just that saying that Erlang is side effect free (per the OP), is misplaced. Erlang is an insanely pragmatic language that I use daily, but it very intentionally does not bend over backwards to avoid side effects, instead allowing them in cases where they make sense.

pron · on Feb 17, 2014

No. Lock-free applies to shared, mutable data structures. Erlang (and Haskell in most cases) operate at a much higher level. Remember that side-effect-free programming is a high-level abstraction over most-certainly-not-side-effect-free code. Erlang's process mailboxes might make use of a lock-free queue, though. Architecting your program with actors is most certainly not lock-free as deadlocks may occur.

But being lock-free doesn't necessarily mean better or even faster.

Even working with persistent data structures as in Clojure is not usually considered to be lock-free (even though there are no locks involved, no mutation is taking place). But Clojure does support lock-free programming, for example with atoms.

dschiptsov · on Feb 17, 2014

So, no sharing - no problem, no mutation, no need to synchronize access?

According to classic maxima, the way you structure your data influences the shape of your procedures, that is why list procedures are naturally recursive.

Similarly, if you have no destructive operations, you could partition your data any way you wish, and process the parts in parallel, this is, for example, how make -j4 works. As long as your obj files share nothing they could be build in any order and in parallel.

The idea is that parallelism begins from how you structure your data, and hardware primitives are of much less importance. 10 threads updating one global variable is a disaster no matter what.

pron · on Feb 17, 2014

> The idea is that parallelism begins from how you structure your data

Sure, but again, you're talking high-level. At a low-enough level, every operation is destructive, because that's how our machines work. If you're using immutable data-structures, then you're allocating memory, so you are bumping (destructively) some heap pointer. Even if that pointer is thread-local, sooner or later some garbage collection will take place, which requires some form of synchronization.

The idea is that somewhere in your software stack, be it the OS or the JVM or whatever, there will be shared mutable state, and that's where lock-free data structures are handy. Some high-level languages abstract that away from you so you don't have to deal with this stuff.

Also, parallelism is only part of the story. Parallelism, or data parallelism, divides the data so that it can be processed cooperatively in parallel, but that's not always what you have. Lock-free data structures are useful when doing concurrency rather than parallelism, i.e. lots of different computations (rather than one), competing for (rather than cooperatively sharing) limited resources.

yummyfajitas · on Feb 17, 2014

I'm not sure what you mean by side-effect-free programming on an actor-based architecture. The whole point of actors is to contain side effects, not eliminate them.

Erlang is certainly not lock free - I'll use a Scala/Akka example since I know it better than Erlang:

    class DiningPhilosopher(leftForkHolder: Actor, rightForkHolder: Actor) extends Actor {
      leftForkHolder ! ForkRequest

      def receive = {
        case ForkFromLeft => {rightForkHolder ! ForkRequest; ...}
      }
    }

That'll deadlock for sure.

I think stream based programming without loops (i.e., the stream processes form a digraph) will generally be lock free.

Uchikoma · on Feb 17, 2014

I refer you to the classic " Erlang Style Actors Are All About Locking "

http://james-iry.blogspot.de/2009/04/erlang-style-actors-are...

CmonDev · on Feb 17, 2014

Does unintended infinite message loop count as a dead-lock?

Rexxar · on Feb 17, 2014

This is generally called livelock : http://en.wikipedia.org/wiki/Deadlock#Livelock