Friday, August 12, 2016

Concurrency talks at CppCon 2016

The program for CppCon 2016 is out and there are some really interesting talks coming up.
On the topic of Concurrency, I'm looking forward to these three talks:
- JF Bastien on "No Sane Compiler would optimize atomics"
- Hans Boehm on "Using weakly ordered C++ atomics correctly"
- Paul McKenney on "A lock-free concurrency toolkit for deferred reclamation and optimistic speculation"

JF Bastien has some really good insights into optimizing code with atomics and the different memory orders, so I really want to see what he's going to say this year.
If you don't know who he is, then take a look at his interview from last year's CppCon or from CppCast:
http://concurrencyfreaks.blogspot.pt/2015/11/interviews-for-channel9.html
http://cppcast.com/2015/07/jf-bastien/
and a preview of what his presentation will look like:
https://github.com/boostcon/cppnow_presentations_2016/blob/master/03_friday/no_sane_compiler_would_optimize_atomics.pdf
and if you want to see some funny comments then check the last 10 minutes of this talk where JF Bastien gives some comments to the speaker:
https://www.youtube.com/watch?v=k12BJGSc2Nc

As for Hans Boehm, well, if you're a regular reader from this blog I shouldn't need to introduce him, but in case you just landed here by accident, then he's one of the contributors to the C++ Memory Model... and the Java Memory Model.
This year he's at CppCon and he's going to talk about common mistakes with non seq-cst atomics, so I definitely want to see his talk.
Here is some of the content he's going to go over:
https://docs.google.com/presentation/d/1Eapu4G6QcetO9mOeZSQJAuvxB8kK3pHQwxWbHNOLk8w/pub?start=false&loop=false&delayms=3000#slide=id.p
Funny thing, I was considering giving a talk this year precisely about this topic but I'm not able to go to CppCon so I didn't even submit the proposal. Obviously, I wouldn't expect my talk to be as good as his is going to be, I was thinking of focusing more on examples and what kind of optimizations are safe to do when writing an algorithm using C++ atomics.
... ohh well, maybe I can still propose it next year as a more "hands-on-approach".

And finally, Paul McKenney is going to to give two talks, with the first one being about the Issaquah challenge (which he talked about a couple years ago so I hope he's going to go a bit more in depth this time)
https://www.youtube.com/watch?v=1Q-RH2tiyt0
and the second talk is going to be about safe memory reclamation in concurrency.
It looks like Paul, Michael Wong and Maged Michael are trying to add techniques for safe memory reclamation to the C++ standard, namely, adding RCU and Hazard Pointers.
For those of you who don't know what this means, let me just say that most experts in Concurrency (shared memory concurrency) say that Memory Reclamation is the toughest problem in Concurrency. This includes people like Maurice Herlihy, Erez Petrank, and many others.
If Memory Reclamation is the toughest problem in Concurrency, and Concurrency is the toughest way of programming, then Memory Reclamation must be the hardest thing you can do in Programming.

There are three main ways to do memory reclamation in concurrent (non-blocking) applications: Atomic Reference Counters, RCU, and Hazard Pointers (HP).
They have different properties, like different progress conditions for Readers and Reclaimers. By "Readers" I mean the threads calling methods that don't delete anything in the data structure, and "Reclaimers" are the ones doing the deletions.

Atomic Reference Counters are already in the next standard and should be part of C++17.
Herb Sutter was the one that proposed them and I'm glad he did, but when it comes to concurrent data structures, it is pretty much useless because it is only non blocking in certain situations, and it is the slowest of the three methods for Readers because when you're traversing a list with nodes you need to do two atomic_fetch_add() per node that is traversed.
If you want to understand what situations are those, then take a look at chapter 9.1 of Paul Mckenney's book in this link https://www.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook-e1.pdf

RCU is better known for its implementation in the Linux Kernel, but there are many userspace implementations of RCU. We used some of them in the Left-Right technique.
RCU is wait-free (population oblivious) and super fast for Readers, but it is blocking for Reclaimers which means that if you want high throughput or low latency even when doing removal of nodes/objects from a data structure, this is isn't going to give you what you want.
Even with delegation, where the last reader will delete the node/object, it doesn't change the progress condition for the Reclaimers, and it even makes the Readers worse off because now they may have to delete an unbounded number of objects, thus making the Readers wait-free unbounded, which has much worse tail latency.
RCU is the simplest to use of the three techniques because its semantics for the Readers are similar to those of a Reader-Writer Lock.

Hazard Pointers are typically lock-free for the Readers (wait-free in some situations) and wait-free bounded for the Reclaimers.
The main drawback is that they're slow for Readers because we have to do one sequentially consistent store for every node that is traversed.
It's not as slow as Atomic Reference Counter is for Readers, but it is slower than RCU which pretty much has no synchronization apart from the initial call to rcu_read_lock() and the final call to rcu_read_unlock().
Of the three techniques, this is the hardest to use, not because it is hard per-say, but because it requires a very deep knowledge of the algorithm where you're applying hazard pointers to. If you want to make it wait-free then the difficulty goes up a notch, and it typically requires the algorithm to be designed with HPs in mind.
One of the untold advantages of HPs is that they are memory bounded. This means that at any given time you can not have more than (MAX_THREADS x MAX_HPS) nodes/objects in the retired lists waiting to be deleted (for an 'R' factor of zero). RCU has no such bound and if you have memory constraints this can become problematic and it can impact your tail latency for the Reclaimers.

There are many other methods for safe memory reclamation, but they are either variations on these ideas or they rely on some architecture/OS specific functionality, like needing Posix signals, or some CPU instructions that only exist on some hardware. Those are "outside" of the C++ Memory Model, i.e. you can't implement them using only atomics and the Memory Model and therefore, their implementations are not portable.

Paul McKenney is one of the authors of RCU.
Hazard Pointers are the brain-child of Maged Michael.
Obviously, there is a bit of friendly rivalry between the authors of the two best methods for non-blocking memory reclamation  :)
Having the two of them together in this endeavor to provide safe memory reclamation in C++, is like having Professor Xavier and Magneto joining forces to bring down Apocalypse  :)
Yes, it's really _that_ cool!

Sure, I'm a fan boy and I'm exaggerating. Just like Andreia was saying the other day, you can implement yourself Hazard Pointers and RCU, efficiently, in C++, and you don't really need library support for that, and in fact _we_ have done it.
However, I'm not going to say it was easy (it wasn't, specially the hazard pointers stuff) and most people just want to use a pre-existing library/API which they know is correct and fast for most usages, and that's why having Paul, Maged Michael, and Michael Wong behind this is a great thing for C++ and its community.
If you care about Concurrency and if you're at CppCon this year, make sure to attend this talk, it should be epic!

No comments:

Post a Comment