Must add {:inline 1} to outputted gbpl.
Use of THREAD_N=192 is very slow because we fallback to only % (mod) invariants.
If THREAD_N is a power of 2 then we can significantly speed this up.
