`Best' scan from PUG paper/CUDA-SDK verifying.

Same assumptions as workefficient version (fixed block and input size).

$ FrontEnd -o kernel.cu.gbpl kernel.cu && sed -i -e 's/BOOL_LNOT((__enabled/(!(__enabled/' kernel.cu.gbpl

There is a problem with the variables [ai] and [bi] being used in different scopes
(kernel-level and within inner blocks of loops). To see the problem, compare:
$ FrontEnd kernel.cu
and
$ FrontEnd kernel.cu -D RENAME 

