Add VERILATOR_NUMA_STRATEGY environment variable (#6826) (#6880)

Signed-off-by: Yangyu Chen <cyy@cyyself.name>
This commit is contained in:
Yangyu Chen 2026-01-06 23:20:57 +08:00 committed by GitHub
parent f9f7a7146d
commit 2ba96536e6
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 73 additions and 2 deletions

View File

@ -1,6 +1,8 @@
.. Copyright 2003-2026 by Wilson Snyder.
.. SPDX-License-Identifier: LGPL-3.0-only OR Artistic-2.0
.. _Environment:
Environment
===========
@ -89,6 +91,20 @@ associated programs.
If set, the command to run when using the :vlopt:`--gdb` option, such as
"ddd". If not specified, it will use "gdb".
.. option:: VERILATOR_NUMA_STRATEGY
If set, controls NUMA assignment strategy for Verilator's thread pool
for Verilated simulations at runtime.
Possible values are:
* Empty(``""``) or ``"default"``: Enables NUMA assignment that prioritizes
assigning Verilator threads to physical cores.
* ``"none"``: Disables NUMA assignment. Let the operating system handle
thread scheduling.
Other values may be supported in future releases.
.. option:: VERILATOR_ROOT
The ``VERILATOR_ROOT`` environment variable is used in several places:

View File

@ -85,8 +85,11 @@ above documentation for these options.
If using Verilated multithreaded, consider overriding Verilator's default
thread-to-processor assignment by using ``numactl``; see
:ref:`Multithreading`. Also, consider using profile-guided optimization;
see :ref:`Thread PGO`.
:ref:`Multithreading`. If your OS can handle thread assignment for your
design and hardware well, consider disabling Verilator's NUMA assignment by
setting the :vlopt:`VERILATOR_NUMA_STRATEGY` environment variable to
``none``; see :ref:`Environment`. Also, consider using profile-guided
optimization; see :ref:`Thread PGO`.
Minor Verilog code changes can also give big wins. You should not have any
:option:`UNOPTFLAT` warnings from Verilator. Fixing these warnings can

View File

@ -285,6 +285,13 @@ schedules threads using multiple hyperthreads within the same physical
core. If there is no affinity already set, on Linux only, Verilator
attempts to set thread-to-processor affinity in a reasonable way.
Some newer Linux kernels handle thread assignment well. If running
Verilator on such a system, automatic thread affinity may not be
beneficial and may even reduce performance. In this case, environment
variable :vlopt:`VERILATOR_NUMA_STRATEGY` may be set to ``none`` to
disable automatic thread affinity. For more information, refer to
:ref:`Environment`.
For best performance, use the :command:`numactl` program to (when the
threading count fits) select unique physical cores on the same socket. The
same applies for :vlopt:`--trace-threads` as well.
@ -311,6 +318,15 @@ adjusted if you want another simulator to use, e.g., socket 1, or if you
Verilated with a different number of threads. To see what CPUs are actually
used, use :vlopt:`--prof-exec`.
On Systems with multiple L3 clusters per socket (e.g., AMD EPYC or Ryzen),
consider using :command:`lstopo` to determine the L3 cluster topology of
the current system and :command:`numactl` to bind CPUs within a single L3
cluster. This can improve performance for minimal communication latency
between threads. Sometimes, for model's thread counts that are more than
the core count per L3 cluster, using SMTs (hyperthreads) within a single L3
cluster can have better performance than spreading across multiple L3
clusters using physical cores only. Experimentation is recommended to find
the best settings for underlying hardware and model characteristics.
Multithreaded Verilog and Library Support
-----------------------------------------

View File

@ -147,6 +147,12 @@ VlThreadPool::~VlThreadPool() {
std::string VlThreadPool::numaAssign() {
#if defined(__linux) || defined(CPU_ZERO) || defined(VL_CPPCHECK) // Linux-like pthreads
std::string numa_strategy = VlOs::getenvStr("VERILATOR_NUMA_STRATEGY", "default");
if (numa_strategy == "none") {
return "no NUMA assignment requested";
} else if (numa_strategy != "default" && numa_strategy != "") {
return "%Warning: unknown VERILATOR_NUMA_STRATEGY value '" + numa_strategy + "'";
}
// Get number of processor available to the current process
const unsigned num_proc = VlOs::getProcessAvailableParallelism();
if (!num_proc) return "Can't determine number of available threads";

View File

@ -45,4 +45,34 @@ for trial in range(0, trials):
# False fails occasionally
# test.file_grep_not(gantt_log, r'%Warning:') # e.g. There were fewer CPUs (1) than threads (3).
if sys.platform != "darwin":
# Test disabling NUMA assignment
gantt_log_numa_none = test.obj_dir + "/gantt_numa_none.log"
test.execute(run_env='VERILATOR_NUMA_STRATEGY=none',
all_run_flags=[
"+verilator+prof+exec+start+2", " +verilator+prof+exec+window+2",
" +verilator+prof+exec+file+" + test.obj_dir + "/profile_exec.dat"
])
test.run(cmd=[
os.environ["VERILATOR_ROOT"] + "/bin/verilator_gantt", "--no-vcd", test.obj_dir +
"/profile_exec.dat", "| tee " + gantt_log_numa_none
])
test.file_grep(gantt_log_numa_none, r'NUMA status += no NUMA assignment requested')
# Test invalid NUMA assignment
gantt_log_numa_invalid = test.obj_dir + "/gantt_numa_invalid.log"
test.execute(run_env='VERILATOR_NUMA_STRATEGY=invalid_value',
all_run_flags=[
"+verilator+prof+exec+start+2", " +verilator+prof+exec+window+2",
" +verilator+prof+exec+file+" + test.obj_dir + "/profile_exec.dat"
])
test.run(cmd=[
os.environ["VERILATOR_ROOT"] + "/bin/verilator_gantt", "--no-vcd", test.obj_dir +
"/profile_exec.dat", "| tee " + gantt_log_numa_invalid
])
# %Warning: unknown VERILATOR_NUMA_STRATEGY value 'invalid_value'
test.file_grep(
gantt_log_numa_invalid,
r"NUMA status += %Warning: unknown VERILATOR_NUMA_STRATEGY value 'invalid_value'")
test.passes()