This webpage contains additional material for the OOPSLA 2015 paper Remote-Scope Promotion: Clarified, Rectified, and Verified by John Wickerson, Mark Batty, Bradford M. Beckmann, and Alastair F. Donaldson.
The following instructions should enable reproduction of the claims in Sections 3 and 4 of the paper that (a) we have extended the Herd simulator with support for remote-scope promotion, (b) this allows investigation of the remote-scope promotion litmus tests, confirming the issues we reported in the paper, and (c) one of these tests captures the bug we found in the work-stealing queue implementation.
herd and the password is letmein.)herdtools directory, run make. This assumes that OCaml, version 4.00 or greater, is installed. herd directory, run ./herd ‑help to check that Herd has been built correctly.herd/testsuite/RSPTests directory contains 14
.litmus tests, 12 of which correspond to the test
suite of the original RSP developers, and 2 of which were
extracted from the work-stealing queue implementation of Orr et
al.herd directory, the
opencl_rem.cat file contains our formalisation of the OpenCL+RSP memory model.
P.litmus with respect to a model
M.cat, run
./herd ‑initwrites true ‑model M.cat P.litmus. For
example, to simulate the first RSP litmus test, run the following command.
./herd -initwrites true -model opencl_rem.cat testsuite/RSPTests/RSP_Test1.litmusThis should produce output that includes the following lines.
States 16 Positive: 1 Negative: 15 Bad executions (0 in total):This output indicates that Herd has found 16 executions of this test that are consistent with the memory model, that one of these executions satisfies the litmus test's postcondition, and that none of the executions are racy. The postcondition of a litmus test expresses a condition that is expected never to hold, so this output tells us that the test's postcondition is too strong: it is forbidding outcomes that OpenCL+RSP actually allows.
gv and graphviz installed, you can make use of the
graphical user interface that Herd provides. To see the 16
executions of the first RSP litmus test, run the following
command.
./herd -model opencl_rem.cat -initwrites true -show all -gv -graph columns -showinitwrites false -squished true -scale 2 -symetric dr testsuite/RSPTests/RSP_Test1.litmus
We list here the results that we obtained. We report 'postcondition
violated' when Herd's output includes Positive: n for
some n>0. We report 'races found' when Herd's output
includes Bad executions (n in total) for some
n>0. We also report the approximate time taken, in seconds, to
simulate the test on a 3.1 GHz MacBook Pro. These results can be recreated on our virtual machine (see link above) by running the script runtests.sh in the home directory.
| Test | Postcondition violated? | Races found? | Simulation time /s |
| RSPTest1 | yes | no | 0 |
| RSPTest2 | no | no | 0 |
| RSPTest3 | no | no | 0 |
| RSPTest4 | no | yes | 0 |
| RSPTest5 | no | no | 0 |
| RSPTest6 | no | no | 0 |
| RSPTest7 | yes | yes | 0 |
| RSPTest8 | no | no | 145 |
| RSPTest9 | no | yes | 119 |
| RSPTest10 | no | no | 0 |
| RSPTest11 | yes | no | 34 |
| RSPTest12 | none given | yes* | 0 |
| WSQ1 | no | no | 0 |
| WSQ2 | no | yes | 0 |
In summary, tests RSPTest1 and RSPTest11 have postconditions that rule out behaviours that OpenCL+RSP actually allows, while tests RSPTest4, RSPTest7 and RSPTest9 all have inadvertent data races. The data race in test WSQ2 corresponds to the bug discovered in the work-stealing queue implementation of Orr et al (Section 4.3).
The following instructions should enable reproduction of the claim, in Section 5 of the paper, that we have formalised Orr et al.'s implementation of remote-scope promotion in Isabelle. Our formalisation comprises a mathematical model of a simple GPU device, the semantics of a minimal assembly language for this device, and a scheme for compiling OpenCL+RSP to this assembly language.
We have formalised in Isabelle the statement of the soundness theorem for the implementation of OpenCL+RSP, but we have only proved this theorem by hand. We do not claim in the paper to have mechanised the proof.
RSP_helper.thyRSP_common.thyRSP_high_level_PL.thy) and the low-level language (RSP_low_level_PL.thy). It defines memory locations, the structure of programs, and a generic notion of a program execution. A key definition in this file is lbl, which is the type of node labels on program execution graphs. The lbl type represents memory events such as reads (R), writes (W) and read-modify-writes (RMW). These three events exist for both high-level (OpenCL) programs and low-level (assembly) programs. The lbl type accomodates further kinds of memory event through its 'tau_evt parameter. In the low-level case, 'tau_evt is instantiated to range over events representing cache flushes and invalidations; in the high-level case, 'tau_evt is simply the unit type. The lbl type also allows additional fields to be attached to read/write/RMW events via its 'extra_info parameter. In the high-level case, 'extra_info is instantiated to carry the memory scope of the event, and whether the event is remote; in the low-level case, 'extra_info is simply the unit type.RSP_high_level_PL.thyH_sem_ins, which defines the execution graphs of a single instruction. As an illustrative example, Lemma mp_example shows how the executions of a simple message-passing program are obtained.RSP_opencl_mm.thyopencl_rem.cat. The key definitions are H_consistent, which is a predicate that holds when a given execution is consistent, faulty, which holds when a given execution has a race, and H_behaviours, which returns the set of allowed executions of a given OpenCL program.RSP_state.thystore, which represents the contents of a memory; that is, a mapping from addresses to values. The store type accepts an additional 'tag, which represents extra information associated with each address. When the store is representing the contents of a cache (c.f. the cache type), the 'tag parameter tracks whether each cache entry is valid and/or dirty. We make further use of the 'tag parameter to insert pieces of 'ghost information' into the machine state. Ghost data is written by the machine, but cannot be read by it; it is used in the soundness proof.RSP_low_level_PL.thyRSP_state.thy. A key definition is env_trans, which defines the state transitions that can be caused by the environment at any time during a program's execution. This corresponds to Table 2 ("Environmental transitions") in the paper. The most important definition in this file is Lred_ins, which defines the semantics of a given instruction using a small-step style. This corresponds to Table 1 ("Semantics of assembly instructions") in the paper. However, Lred_ins does more than just define the 'next state' relation as in Table 1; it also builds up the program's execution graph (the X component) and keeps track of the last-used event identifier (the e component). To clarify: the semantics of the language is specified by the sequence of state transitions, but it is useful for the soundness proof to construct also an execution graph as the program is evaluated.RSP_compiler.thycompile_expr and compile_ins, which are responsible for compiling expressions and instructions, respectively. These correspond to Table 3 ("Compilation schemes, original and proposed") in the paper. Lemma compiling_example provides an example of the compilation process on a simple program. Note that the Old and New constants are used to refer to the original (buggy) and the proposed (fixed) compilation schemes, respectively.RSP_compiler_exn.thycompile_pre_exn function will produce a corresponding execution graph of low-level memory events. The compile_evt_lbl function is responsible for the mapping from high-level memory events to (lists of) low-level memory events. The compile_pre_exn function is not actually constructive; rather, it can only be used to check that a given low-level execution is a valid compilation of a given high-level execution.RSP_soundness.thysoundness. A hand-written proof of this theorem
is contained in the paper's appendix.