1. "Verifying the prescan specification"

This experiment demonstrates that GPUVerify with Barrier Invariants can verify
the monotonic specification of the Blelloch prescan implementation.

2. "Scaling verification through an abstract operator"

This experiment demonstrates that introducing an abstract-add operator
can lead to better scaling, but at the cost of having to find appropriate
associativity rewrite rules.

---+ FILES
  - kernel.cl
  contains the prescan implementation from Blelloch, as given in Figure 2.
  - specs/*.h
  contain the barrier invariants, as given in Section 4.
  - assoc-{8,16,32}.bpl
  contain the associativity rewrite rules for verifying the upsweep when using
  an abstract operator.

---+ REPRODUCING THE EXPERIMENTAL RESULTS

The graph in Figure 8(a) gives results for the following parameters:

number of threads = 2,4,8,...,128
bitwidth          = 32,16,8
operator          = add,max,or

A single instance of this can be verified by running the following script:

> python fig8a.py --bitwidth=[32|16|8] --op=[add|max|or] nthreads

This runs verifications on three parts of the Blelloch prescan, which in
combination give the specification result. Adding the times of each three parts
gives the times reported in Figure 8(a). These three parts are:
   - the upsweep loop
   - the downsweep loop
   - the end specification

The graph in Figure 8(b) gives results for the following parameters:

number of threads = 2,4,8,...,128
bitwidth          = 32
operator          = add,max,or,abstract

> python fig8b.py --op=[add|max|or|abstract] nthreads

As noted in Section 5, we can also use an abstract operator that uses axioms to
encode associativity. However, we only present results for the upsweep and
downsweep parts, omitting the end specification as we did not find scalable
rewrite rules for this part of the prescan.
