Instead of using the number of processors in the host, use the number of
processors available to the process, respecting cpu affinity
assignments. Without pthreads, fall back and use the number of
processors in the host as before.
This is now applied everywhere so runing `nuamctl -C 0-3 verilator` or
`numactl -C 0-3 Vsim` should behave as if the host has 4 cores (e.g.
like in CI jobs)