Journal Club on Concurreny and Parallelism

Now what is a journal club? It means that for this course:

You will read the literature for each session in advance
There will be a test at the beginning of every session
The results will be discussed together
Open questions about the literature will be discussed together
There will be NO LECTURE in those sessions. If you don't read the literature, you won't learn at thing.
You will experience problems understanding the papers. This is OK for the first sessions
You will learn to better understand the papers and discover the critical sections (which usually become questions for the tests (:-))
You will learn to ask questions about papers
You will participate in a workshop on concurrency and parallelism, together with the industry. (One Friday afternoon)
At the end of the journal club, you should have learned two things: First, a lot about current concepts and technologies for multicore programming. Second, how to really use papers to get an understanding of advanced engineering topics.


WARNING!!!!!!!!!!!!!!!!!!! If you do not have the time to read the papers (and some are quite lengthy), please do NOT participate in this course. You will simply drop out after 3 sessions, as you won't understand a thing being discussed. Last time 36 students started and after a short time became 18, which finished successfully.

We will be using two books: The Art of Multiprocessor Programming by Maurice Herlihy and Nir Shavit. And the seminal "Computer Architecture, a quantitative approach by John L. Hennessy and David A. Patterson. It would not hurt to order them from an Indian Bookstore at a good price...

The literature: Papers on thread-level parallelism and concurrency techniques

Programming Paradigms, Concurrency, Parallelism, Asynchrony etc.

Concurrency and parallelism are core concepts of computing. Peter van Roy has created an overview of the most important concepts underlying out programming languages and it is worth to take a look at those. Programming Paradigms for Dummies: What Every Programmer Should Know, Peter Van Roy. And it is surprisingly hard to define what concurrency, parallelism and asynchrony really mean. Please read the paper for this session.

Basics: Memory Models, Thread local, Multicore etc.

In this session we will start with an understanding of memory models and why they are needed. We will uncover the secrets of volatile, thread level storage, race conditions etc. We will be reading two sections from the Herlihy book, namely chapter 3.8 (Java Memory Model) and Appendix A.1 and A.2. Start with A.2 and test your understanding of multicore code!

Thread-Level Parallelism and the Memory Hierarchy

You might have been surprised about the freedom of languages and compilers regarding your sequential statements. It gets worse when we look at how the hardware treats your code. This session is central about all things regarding multi-threading on current multicore processors and you will understand NOTHING without diving into the gory details. We will be reading Chapter 5: Thread-Level Parallelism. Many of you will notice, that they miss a lot about caching, CPUs etc. The other chapters of this book written by the "gods of risc CPUs" provide all the Information.


Please note, that this chapter is quite big and a bit hard to understand. PLEASE reserve enough time to read it!!!!


Concurrent Objects and Consistency

There is not just one form of consistency, and developments like NoSql databases and asynchronous processing in large scale sites are proof of it. In this session we are taking a big gulp of theory about how parallel processing works - both logically and physical. The session is based on chapter 3 of the Herlihy book. If you find this challenging (and it is), you might IN ADDITION read A foo walks into a bar. Take you time with reading the papers!

Wait and lock-free Algorithms (or is it data structures??)

Fiddling around with weakend consistency rules is one thing. Something else is trying to find algorithms which do not require blocking . This has become a field of current research due to the bad scalability of locking algorithms on multicore architetures. I would like to look at the following algorithms:


Chapter 5.8 in Herlihy: The compare and set operation (needed for wait-free consensus between threads)
3. Chapter 6.4 in Herlihy: Universal wait-free construction. (to see, how complex wait-free algs can become)
4. Chapter 7.5 in Herlihy: Queue Locks (scalable spin-locks)
5. Chapter 8.3 in Herlihy: Reader-Writers Locks (because they are needed so often)

2. Session

Chapter 9.3 in Herlihy: concurrent reasoning (how to use invariants for proofs)
Chapter 9.8 in Herlihy: Non-Blocking Synchronization (lazy and optimistic methods for lists)
Chapter 10.5 und 10.6 in Herlihy: Unbounded lock-free queue and ABA problem
Chapter 16.5 in Herlihy: Work-stealing Dequeues (needed for Fork/Join Framework in Java)

. All can be found in the Herlihy book. For atomic snapshots, the paper by Chandy and Lamport Distributed Snapshots: Determining Global States of Distributed Systems is relevant too.

Professional Shared-State Multithreading in Java: java.util.concurrent

There are still many places, where shared state multithreading has no alternative. The concurrent package has many helpful algorithms and data structures for this case. Java 101: Java concurrency without the pain, Part 1 and Java 101: Java concurrency without the pain, Part 2

Also relevant is Doug Leas paper on the fork-join framework. To get a better understanding of the library, you can look at some animations from javaConcurrentAnimated.jar. To see how the library elements are used, take a look at the talk from Jessica Karr, Concurrency Options on the JVM. All resources are in my owncloud.

A nice talk from Thompson:
Functional Programming

A programming paradigm that has little problems with concurrency is FP. The paper contains some basic information on it. Functional Programming For The Rest of Us . We will focus on continuations, currying, side-effects etc. The sessoin is a preparation for the next sessions on Java8 streams and Reactive Programming.

java8 Streams

Finally Java has got higher order functions. We will look at the requirements of stream.parallel(), distinguish .reduce from .collect and look at the fork/join engine behind.

Consistent Replicated Data Types

What if we cannot afford synchronization but need to parallelize our data processing? Are there data-structures which support some form of "eventual consistency" e.g. because the results can be aggregated alter without problems? This exciting research area is actually a really hot topic at google, facebook and co. We can start with an excellent strangeloop video from Peter at Soundcloud: Consistency without consensus in production systems. There is also a nice introduction to CRDTs and CALM principles in Distributed Systems for fun and profit.. Deeper going information can be found here: CRDTs: Consistency without concurrency control, Mihai Let, Nuno Pregui, Marc Shapiro and Data Structures in the Multicore Age

Software Transactional Memory

This would be a topic in itself and there is a nice intro in the Herlihy book. But I'd like to see it covered together with Clojure in our workshop on C&P in January. We sometimes need the concept of an atomic "unit of work" (ACID). But we don't want programmers to deal with locks. The STM concept offers a solution, where the compiler controls updates to data structures and automatically rolls back, if something went wrong. We use the chapter ont STM in Herlihy to understand the algorithm.

Reactive Processing

We did have an introduction by Benjamin Hensle at the beginning and we won't be able to do much more this term. The Lmax-distributor is certainly a topic for next time. Asynchronous processing pipelines with backpressure. How does the multi-threading work under the hood?

Staring into the abyss - or why thread-level concurrency is mostly nonsense...

If we find some time after the workshop, we could use it to discuss two small papers. One describes an interesting experiment by well-known database enfant terrible Michael Stonebreaker Take the most commonly used technologies for concurrency and parallelismus and run them on a simulated 1000-core machine. Staring into the Abyss: An Evaluation ofConcurrency Control with One Thousand Cores is a sobering experience as it shows, that NO concurrency control technology today is able to deal efficiently with 1000 cores.

Finally, the Linux-legend Thorvalds questions the use of multi-core architectures in many use cases in his post on Avoiding Ping Pong. Some quotes from there: "What's the advantage? You won't get scaling for much longer, and current trends are actually for lower power anyway. So what's the upside of pushing the whole parallelism snake-oil? We know that we need fairly complex OoO CPU's anyway, because people want reasonable performance and it turns out OoO is actually more efficient than slow in-order. The whole "let's parallelize" thing is a huge waste of everybody's time. There's this huge body of "knowledge" that parallel is somehow more efficient, and that whole huge body is pure and utter garbage. Big caches are efficient. Parallel stupid small cores without caches are horrible unless you have a very specific load that is hugely regular (ie graphics". Ok, enough. Go and read the post...

The Workshop: Designing C&P Systems

On 15th January we are going to have a workshop on concurrency and parallelism, together with the industry. I suggest the following topics for short presentations. A final discussion of the different approaches will hopefully increase our understanding of this complicated topic. We will also discuss patterns of applicability with respect to the different technologies.

Some of the ideas for the workshop follow the book "seven concurrency models in seven weeks" by Paul Butcher. and some were suggestions from the course.

Actors, Agents, active Objects

Clojure and Akka are well known for their agent-based architecture. And we should take a look at both. The Clojure approach is well documented in a chapter in Paul Butcher's book and Vanessa Werner is currently working at Exxeta on Akka as part of her master thesis.

Message Passing

Hidden behind the rather arcane topic of "Communicating Sequential Processes", MP has been a core architectural feature of Erlang for many years. And not just in the telco industry. Large-scale sites use many Erlang programs too. The GO language made the concept even more popular. The Butcher book has a chapter on CSP with Clojure and we could use it as an intro for the concept.

Separating Identity from State

State and updates of state are usually the most critical parts of concurrent systems. But it is rather hard to imagine a system, where you can have variables without the state problem. Clojure seems to separate identity from state using STM mechanisms. A good reason to look at how this works by reading the chapter in Paul Butchers book. There is also a chapter on STM in the Herlihy book if you need more background on the algorithms behind. BTW: what level of isolation is provided by the algorithm in the Herlihy book? is it sequential consistency? linearizability? some form of snapshot isolation?

Embarrassingly Parallel: GPU processing

This has not been covered in our course, but it gives us a good chance to better understand the differences between thread-level parallelism (our topic) and what can be done with different kinds of problems, e.g. massively parallel algorithms. The goal would be to clearly distinguish use-cases for both technologies.

Real-time Stream processing

Just about the hottest topic in Big Data and large-scale sites: real-time processing of data in pipelines. There is an interesting architectural pattern called "lambda architecture", (not to confuse with Amazon's new lambda architecture, which is equally exciting..) which allows both batch and real-time processing of data. It is worth to take a look at this and see, what we can do with this kind of system.