The default stack size of secondary thread on macOS is 512k, which is
too small even to run some of the tests. Unfortunately changing the
thread size must happen via `pthred_create` attributes, which are not
available via the c++ threading APIs. Use pthreads directly on macOS,
and set the worker thread sizes to the same as the main thread stack.
The Syms class can contain a very large number of VeriltedScope
instances if `--vpi` is used, all of which need a call to the default
constructor in the constructor of the Syms class. This can lead to very
long compilation times, even without optimization on some compilers.
To avoid the constructor calls, hold VeriltedScope via pointers in the
Syms class, and explicitly new and delete them in the Syms
constructor/destructor. These explicit new/delte can then be
automatically split up into sub functions when the Syms
constructor/destructor become large.
Regarding run-time performance, this should have no significant effect,
most interactions are either during construction/destruction of the Syms
object, or are via pointers already. The one place where we used to
refer to VerilatedScope instances is when emitting an AstScopeName for
things like $display %m. For those there will be an extra load
instruction at run-time, which should not make a big difference.
Patch 3 of 3 to fix long compile times of the Syms module in some
scenarios.
In order to avoid long compile times of the Syms constructor due to
having a very large number of member constructor sto call, move to using
explicit ctor/dtor functions for all but the root VerilatedModule. The
root module needs a constructor as it has non-default-constructible
members. The other modules don't.
This is only part of the fix, as in order to avoid having a default
constructor call the VerilatedModule needs to be default constructible.
I think this is now true for modules that do not contain strings or
other non trivially constructible/destructible variables.
Patch 1 of 3 to fix long compile times of the Syms module in some
scenarios.
AstMTaskBody is somewhat redundant and is problematic for #6280. We used
to wrap all MTasks in a CFunc before emit anyway. Now we create that
CFunc when we create the ExecMTask in V3OrderParallel, and subsequently
use the CFunc to represent the contents of the MTask. Final output and
optimizations are the same, but internals are simplified to move
towards #6280.
No functional change.
Removed the VlTriggerVec type, and refactored to use an unpacked array
of 64-bit words instead. This means the trigger vector and its
operations are now the same as for any other unpacked array. The few
special functions required for operating on a trigger vector are now
generated in V3SchedTrigger as regular AstCFunc if needed.
No functional change intended, performance should be the same.
Instead of using the number of processors in the host, use the number of
processors available to the process, respecting cpu affinity
assignments. Without pthreads, fall back and use the number of
processors in the host as before.
This is now applied everywhere so runing `nuamctl -C 0-3 verilator` or
`numactl -C 0-3 Vsim` should behave as if the host has 4 cores (e.g.
like in CI jobs)
There was exactly one place in V3Task, handling DPI arguments when we
relied on cleanOut of AstCExpr being false for masking. Made that code
do the relevant masking via a few new run-time functions, which also
eliminates some special cases in the relevant V3Task functions.
* Skip profiling tests on non-glibc platforms
* Enforce dumb terminal in tests
* Include POSIX headers whenever __unix__ macro is defined
* Treat no procfs as normal condition
* Respect MAKE variable when running make