About PGI CUDA x86
It uses OpenMP internally and generate one thread per block when it complies CUDA C with optimized mode.
- Optimized mode
- One thread per block
- Non-Optimized mode
- block Dim tasks running on N threads and N cores.
It uses OpenMP internally and generate one thread per block when it complies CUDA C with optimized mode.