emibench: EMI testing for real-world OpenCL programs
====================================================

We give 10 OpenCL benchmarks drawn from the Parboil [0] and Rodinia [1] suites.
Each benchmark has been modified to allow EMI testing. That is, the OpenCL
kernel has dead-by-construction blocks that can be injected at online
compilation time. Each benchmark has an expected output that can be used to
detect possible miscompilations.

[0] J. A. Stratton, C. Rodrigues, I. J. Sung, N. Obeid, L. W. Chang, N.
Anssari, G. D. Liu, and W. W. Hwu. Parboil: A revised benchmark suite for
scientific and commercial throughput computing. Technical Report
IMPACT-12-01, University of Illinois, at Urbana-Champaign, March 2012.

[1] S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. Lee, and K.
Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings
of the 2009 IEEE International Symposium on Workload Characterization, IISWC
2009, October 4-6, 2009, Austin, TX, USA, pages 44–54. IEEE, 2009.

Structure
=========

Each benchmark has a src/ directory containing the modified application. In
particular, the directories src/emi0 and src/emi1 contain EMI blocks (randomly
generated CLSmith code blocks) that can be injected at online compilation time.
We give 125 EMI block variants (as header files 0.h, 1.h, ..., 124.h) for
testing, including the empty substitution (999.h). We have provided wrapper
python scripts to perform this testing automatically.

The common/ directory contains the EMI library functions mentioned in the paper
for (a) allocating and initialising the dead array (called emi_data in emibench)
for ensuring that the EMI blocks are dynamically unreachable; and (b) compiling
OpenCL kernels using (1) a given EMI block, (2) enabling or disabling
substitutions (see "Injecting into real-world kernels" in Sec. 5) and (3)
enabling or disabling compile optimisations. These options can all be specified
at the command-line to our wrapper scripts.

The results/ directory contains raw CSV files generated from running EMI
testing over different configurations. We have anonymized some CSV files for
some configurations.

The bugs/ directory contains bugs that we have reduced from wrong code errors.
We have anonymized certain bugs.

Prerequisites
=============

* An OpenCL enabled platform, device and accompanying SDK including OpenCL
  headers and libraries.
* Python >= 2.7.3

Linux requires Make, g++

Windows requires Visual Studio

Building under Linux
====================

Assuming a bash shell, export the path to your OpenCL headers and libraries
so that $CLDIR/include/CL/cl.h and $CLDIR/lib/libOpenCL.so are correct.

  $ export CLDIR=/path/to/opencl-install

Now try building the first benchmark, parboil_bfs:

  $ cd parboil_bfs
  $ make

If this worked successfully then ./build/bfs should now be built. Other
benchmarks can be built similarly. Assuming that you are in the root directory
of emibench, run the following to compile all other benchmarks:

  $ ./linux_compile_all.sh

Move onto "Running an EMI test", below.

Building under Windows
======================

First set up your environment to setup %OCL_INCLUDE% and %OCL_LIB% 
so that %OCL_INCLUDE%/CL/cl.h and %OCL_LIB%/OpenCL.dll are correct.
For example, if using the AMD SDK:

  > set OCL_INCLUDE="C:\Program Files (x86)\AMD APP SDK\2.9-1\include"
  > set OCL_LIB="C:\Program Files (x86)\AMD APP SDK\2.9-1\lib\x86\OpenCL.lib"

You may find the script "amd_set_windows_environment_vars.bat" useful to
modify. Now try building the first benchmark, parboil_bfs:

  > cd parboil_bfs
  > windows_compile.bat

If this worked successfully then ./build/bfs.exe should now be built. Other
benchmarks can be built similarly. Assuming that you are in the root directory
of emibench, run the following to compile all other benchmarks:

  > windows_compile_all.bat

Move onto "Running an EMI test", below.

Running an EMI test
===================

Change directory into a benchmark, say parboil_bfs:

  $ cd parboil_bfs

Run an EMI test by using the wrapper script:

  $ python ../scripts/runsingle.py --optimisations 0 --substitutions 0 --emi_block 0 --verbose

This will invoke the benchmark to generate the expected output (using the empty
EMI block), then invoke the benchmark with the given emi block (0) with
substitutions off and optimisations disabled. You should expect output similar
to:

  ./build/bfs -i data/1M/input/graph_input.dat -o bfs.out --optimisations 0 --substitution 0 --emi_block 999
  # OpenCL compiler flags are [ -I src -D NO_SUBSTITUTION -cl-opt-disable -D EMI_BLOCK=999]
  Starting GPU kernel
  GPU kernel done
  IO        : 1.795649
  Copy      : 0.640985
  Driver    : 0.286905
  Timer Wall Time: 2.723589


  ./build/bfs -i data/1M/input/graph_input.dat -o bfs.out --optimisations 0 --substitution 0 --emi_block 0
  # OpenCL compiler flags are [ -I src -D NO_SUBSTITUTION -cl-opt-disable -D EMI_BLOCK=0]
  Starting GPU kernel
  GPU kernel done
  IO        : 1.856676
  Copy      : 0.739039
  Driver    : 0.332579
  Timer Wall Time: 2.928346

  OKAY

The OKAY flag means that the output matched the expected output.

* Targeting a different platform/device

By default the wrapper scripts will target platform 0 device 0. If you wish to
target a different device then provide --platform X and --device Y flags to
run_single.py, like so:

  $ python ../scripts/runsingle.py --optimisations 0 --substitutions 0 --emi_block 0 --verbose -- --platform X --device Y

NB: note the "--" separating the --platform and --device flags. This means that
these flags are passed directly to the benchmark executable.

Running all EMI tests
=====================

Assuming that runsingle.py worked above all other benchmarks should be compiled
similarly and can be tested using the same wrapper script. The script
./root_runall.py in the root directory can be invoked:

  $ python -u root_runall.py
  # benchmark, opt, sub, emiblock, result
  bfs,0,0,0,OKAY
  bfs,0,0,1,OKAY
  [...]

This will produce a CSV output to stdout where we exhaustively test all 500
possible EMI injections for each benchmark. As noted in the paper, we skip the
spmv and myocyte benchmarks due to data races.

To target a different platform use the same flags as for runsingle.py:

  $ python -u root_runall.py -- --platform X --device Y

NB: note the "--" separating the --platform and --device flags. This means that
these flags are passed directly to the benchmark executable.
