Lecture Series: Assistant Professor, Luis Ceze, Computer Science and Engineering, University of Washington
Lecture Series:
Advances in Multiprocessor Programmability Using Coarse-Grain Execution
Lecturer:
Assistant Professor, Luis Ceze
Place:
National Institute of Informatics, 20F, Meeting Room
Date:
September 10, 11, 14, 15, Time: 3pm - 5pm
Fee: Free
Registration: Not required
Biography:
Luis Ceze is an Assistant Professor in the Computer Science and Engineering Department at the University of Washington. His research focuses on computer architecture, compiler, programming models and OS to improve the programmability and reliability of multiprocessor systems. He has co-authored over 30 papers in these areas, and had three papers selected as IEEE Micro Top Picks. He participated in the
Blue Gene, Cyclops, and PERCS projects at IBM and is a recipient of
several IBM awards, including an IBM PhD Fellowship. He obtained his
PhD in Computer Science from UIUC in 2007 and has received awards for
research and academic accomplishments, including the Ross Martin Award
for Outstanding Research Achievement in the College of Engineering,
the David Kuck Outstanding PhD Thesis Award, and NSF CAREER Award. He
recently co-founded a startup company where he is a part-time
consultant.
Lecture 1:
"Bulk Disambiguation of Speculative Threads in Multiprocessors"
Transactional Memory (TM), Thread-Level Speculation (TLS), and Checkpointed multiprocessors are three popular architectural techniques based on the execution of multiple, cooperating speculative threads. In these environments, correctly maintaining data dependences across threads requires mechanisms for disambiguating addresses across threads, invalidating stale cache state, and making committed state visible. These mechanisms are both conceptually involved and hard to implement.
In this talk, I will present Bulk, a novel approach to simplify thesemechanisms. The idea is to hash-encode a thread’s access informationin a concise signature, and then support in hardware signatureoperations that efficiently process sets of addresses. Such operationsimplement the mechanisms described. Bulk operations areinexact but correct, and provide substantial conceptual and implementation
simplicity. I will discuss an evaluation of Bulk in the context of TLS using
SPECint2000 codes and TM using multithreaded Java workloads.
Despite its simplicity, Bulk has competitive performance with more
complex schemes.
While Sequential Consistency (SC) is the most intuitive memory consistency model and the one most programmers likely assume, current multiprocessors do not support it. Instead, they support more relaxed models that deliver high performance. SC implementations are considered either too slow or ? when they can match the performance of relaxed models?too difficult to implement.
In this talk, I will present Bulk Enforcement of SC (BulkSC), anovel way of providing SC that is simple to implement and offersperformance comparable to Release Consistency (RC). The idea isto dynamically group sets of consecutive instructions into chunksthat appear to execute atomically and in isolation. The hardwareenforces SC at the coarse grain of chunks which, to the program,
appears as providing SC at the individual memory access level.
BulkSC keeps the implementation simple by largely decoupling
memory consistency enforcement from processor structures. Moreover,
it delivers high performance by enabling full memory access
reordering and overlapping within chunks and across chunks. I will
describe a complete system architecture that supports BulkSC and
show that it delivers performance comparable to RC
Current shared memory multicore and multiprocessor systems are nondeterministic. Each time these systems execute a multithreaded application, even if supplied with the same input, they can produce a different output. This frustrates debugging and limits the ability to properly test multithreaded code, becoming a major stumbling block to the much-needed widespread adoption of parallel programming.
In this talk I will make the case for fully deterministicshared memory multiprocessing (DMP). The behavior of anarbitrary multithreaded program on a DMP system is only afunction of its inputs. The core idea is to make inter-threadcommunication fully deterministic. Previous approaches tocoping with nondeterminism in multithreaded programs have
focused on replay, a technique useful only for debugging. In
contrast, while DMP systems are directly useful for debugging
by offering repeatability by default, we argue that parallel
programs should execute deterministically in the field
as well. This has the potential to make testing more assuring
and increase the reliability of deployed multithreaded software.
We propose a range of approaches to enforcing determinism
and discuss their implementation trade-offs. I will show
that determinism can be provided with little performance
cost using our architecture proposals on future hardware,
and that software-only approaches can be utilized on
existing systems.
Writing shared-memory parallel programs is error-prone. Among the concurrency errors that programmers often face are atomicity violations, which are especially challenging. They happen when programmers make incorrect assumptions about atomicity and fail to enclose memory accesses that should occur atomically inside the same critical section. If these accesses happen to be interleaved with conflicting accesses from different threads, the program might behave incorrectly.
Recent architectural proposals arbitrarily group consecutive dynamicmemory operations into atomic blocks to enforce memory orderingat a coarse grain. This provides what we call implicit atomicity,as the atomic blocks are not derived from explicit programannotations. In this talk, I will make the fundamental observationthat implicit atomicity probabilistically hides atomicity violations
by reducing the number of interleaving opportunities between memory
operations. We then propose Atom-Aid, which creates implicit
atomic blocks intelligently instead of arbitrarily, dramatically reducing
the probability that atomicity violations will manifest themselves.
Atom-Aid is also able to report where atomicity violations
might exist in the code, providing resilience and debuggability. I
evaluate Atom-Aid using buggy code from applications including
Apache, MySQL, and XMMS, showing that Atom-Aid virtually eliminates
the manifestation of atomicity violations.