The "volatile" keyword should *never* be used for C/C++ multithreaded code. It's...

aidenn0 · on Oct 2, 2024

> Please use the C/C++ memory model facilities instead

I should point out that for more than half of my professional career, those facilities did not exist, so volatile was the most portable way of implementing e.g. a spinlock without the compiler optimizing away the check. There was a period after which compilers were aggressively inlining and before C11 came out in which it could be otherwise quite hard to otherwise convince a compiler that a value might change.

int_19h · on Oct 3, 2024

The problem is that volatile alone never portably guaranteed atomicity nor barriers, so such a spinlock would simply not work correctly on many architectures: other writes around it might be reordered in a way that make the lock useless.

It does kinda sorta work on x86 due its much-stronger-than-usual guarantees wrt move instructions even in the absence of explicit barriers. And because x86 was so dominant, people could get away with that for a while in "portable" code (which wasn't really portable).

aidenn0 · on Oct 3, 2024

There's a lot to unpack here.

TL;DR: The compiler can reorder memory accesses and the CPU can reorder memory accesses. With a few notable exceptions, you usually don't have to worry about the latter on non-SMP systems, and volatile does address the former.

The volatile qualifier makes any reads or writes to that object a side-effect. This means that the compiler is not free to reorder or eliminate the accesses with respect to other side-effects.

If you have all 3 of:

A) A type that compiles down to a single memory access

B) within the same MMU mapping (e.g. a process)

C) With a single CPU accessing the memory (e.g. a non-SMP system)

Then volatile accomplishes the goal of read/writes to a shared value across multiple threads being visible. This is because modern CPUs don't have any hardware concept of threads; it's just an interrupt that happens to change the PC and stack pointer.

If you don't have (A) then even with atomics and barriers you are in trouble and you need a mutex for proper modifications.

If you don't have (B) then you may need to manage the caches (e.g. ARMv5 has virtually tagged caches so the same physical address can be in two different cache lines)

If you don't have (C) (e.g. an SMP system) then you need to do something architecture specific[1]. Prior to C language support for barriers that usually means a CPU intrinsic, inline assembly, or just writing your shared accesses in assembly and calling them as functions.

Something else I think you are referring to is if you have two shared values and only one is volatile, then the access to the other can be freely reordered by the compiler. This is true. It also is often masked by the fact that shared values are usually globals, and non-inlined functions are assumed by most compilers to be capable of writing to any global so a function call will accidentally become a barrier.

1: As you mention, on the x86 that "something" is often "nothing." But most other architectures don't work that way.

hinkley · on Oct 2, 2024

I’m surprised that’s true. C borrowed very heavily from Java when fixing the NUMA situations that were creeping into modern processors.

jcranmer · on Oct 2, 2024

The C/C++ memory model is directly derived from the Java 5 memory model. However, the decision was made that volatile in C/C++ specifically referred to memory-mapped I/O stuff, and the extra machinery needed to effect the sequential consistency guarantees was undesirable. As a result, what is volatile in Java is _Atomic in C and std::atomic in C++.

C/C++ also went further and adopted a few different notions of atomic variables, so you can choose between a sequentially-consistent atomic variable, a release/acquire atomic variable, a release/consume atomic variable (which ended up going unimplemented for reasons), and a fully relaxed atomic variable (whose specification turned out to be unexpectedly tortuous).

tialaramex · on Oct 3, 2024

Importantly these aren't types they're operations.

So it's not that you have a "release/acquire atomic variable" but you have an atomic variable and it so happens you choose to do a Release store to that variable, in other code maybe you do a Relaxed fetch from the same variable, elsewhere you have a compare exchange with different ordering rules

Since we're talking about Mutex here, here's the entirety of Rust's "try_lock" for Mutex on a Linux-like platform:

        self.futex.compare_exchange(UNLOCKED, LOCKED, Acquire, Relaxed).is_ok()

That's a single atomic operation, in which we hope the futex is UNLOCKED, if it is we store LOCKED to it with Acquire ordering, but, if it wasn't we use a Relaxed load to find out what it was instead of UNLOCKED.

We actually don't do anything with that load, but the Ordering for both operations is specified here, not when the variable was typed.