This patch adds IEEE-1800 compliant scheduling support for the Inactive
scheduling region used for #0 delays.
Implementing this requires that **all** IEEE-1800 active region events
are placed in the internal 'act' section. This has simulation
performance implications. It prevents some optimizations (e.g.
V3LifePost), which reduces single threaded performance. It also reduces
the available work and parallelism in the internal 'nba' section, which
reduced the effectiveness of multi-threading severely.
Performance impact on RTLMeter when using scheduling adjusted to support
proper #0 delays is ~10-20% slowdown in single-threaded mode, and ~100%
(2x slower) with --threads 4.
To avoid paying this performance penalty unconditionally, the scheduling
is only adjusted if either:
1. The input contains a statically known #0 delay
2. The input contains a variable #x delay unknown at compile time
If no #0 is present, but #x variable delays are, a ZERODLY warning is
issued advising the use of '--no-runtime-zero-delay' which is a promise
by the user that none of the variable delays will evaluate to a zero
delay at run-time. This warning is turned off if '--runtime-zero-delay'
is explicitly given. This is similar to the '--timing' option.
If '--no-runtime-zero-delay' was used at compile time, then executing
a zero delay will fail at runtime.
A ZERODLY warning is also issued if a static #0 if found, but the user
specified '--no-runtime-zero-delay'. In this case the scheduling is not
adjusted to support #0, so executing it will fail at runtime. Presumably
the user knows it won't be executed.
The intended behaviour with all this is the following:
No #0, no #var in the design (#constant is OK)
-> Same as current behaviour, scheduling not adjusted,
same code generated as before
Has static #0 and '--no-runtime-zero-delay' is NOT given:
-> No warnings, scheduling adjusted so it just works, runs slow
Has static #0 and '--no-runtime-zero-delay' is given:
-> ZERODLY on the #0, scheduling not adjusted, fails at runtime if hit
No static #0, but has #var and no option is given:
-> ZERODLY on the #var advising use of '--no-runtime-zero-delay' or
'--runtime-zero-delay' (similar to '--timing'), scheduling adjusted
assuming it can be a zero delay and it just works
No static #0, but has #var and '--no-runtime-zero-delay' is given:
-> No warning, scheduling not adjusted, fails at runtime if zero delay
No static #0, but has #var and '--runtime-zero-delay' is given:
-> No warning, scheduling adjusted so it just works
Use uint32_t max value instead of zero as sentinel value for a trace
code being unassigned. Prep for follow on patch.
Note the actual trace file will still start codes from one, the codes
in the model are just an offset from the base code.
PR #6704 introduced the getForceControlSignals function to
verilated_vpi.cpp. It returns a pair of vpiHandles. These handles were
not released, causing a memory leak. This commit fixes this, in addition
to other minor changes for speed and readability that did not make it
into #6704.
No functional change intended.
* logging for the unsatisfied constraints
* Apply 'make format'
* fix teh quote error in the array indexing
* Apply 'make format'
* Len change for the hash for randomity when named assertion is used
* seperate name assertion and satisfied case
* Apply 'make format'
* simply comments and display info
* refine code and fix protect case
* format
* update display in test and .out file
* add an enable flag and warning type, add a protect_id version test and update out files
* Apply 'make format'
* simplify some comments
* update out file, ready to be merged.
* update .py file to set the hash key solid
* rename and reformate the warning message to follow the verilator style
* add a nowarn test
* Apply 'make format'
* ordering
---------
Co-authored-by: Udaya Raj Subedi <075bei047.udaya@pcampus.edu.np>
Co-authored-by: github action <action@example.com>
The default stack size of secondary thread on macOS is 512k, which is
too small even to run some of the tests. Unfortunately changing the
thread size must happen via `pthred_create` attributes, which are not
available via the c++ threading APIs. Use pthreads directly on macOS,
and set the worker thread sizes to the same as the main thread stack.
The Syms class can contain a very large number of VeriltedScope
instances if `--vpi` is used, all of which need a call to the default
constructor in the constructor of the Syms class. This can lead to very
long compilation times, even without optimization on some compilers.
To avoid the constructor calls, hold VeriltedScope via pointers in the
Syms class, and explicitly new and delete them in the Syms
constructor/destructor. These explicit new/delte can then be
automatically split up into sub functions when the Syms
constructor/destructor become large.
Regarding run-time performance, this should have no significant effect,
most interactions are either during construction/destruction of the Syms
object, or are via pointers already. The one place where we used to
refer to VerilatedScope instances is when emitting an AstScopeName for
things like $display %m. For those there will be an extra load
instruction at run-time, which should not make a big difference.
Patch 3 of 3 to fix long compile times of the Syms module in some
scenarios.
In order to avoid long compile times of the Syms constructor due to
having a very large number of member constructor sto call, move to using
explicit ctor/dtor functions for all but the root VerilatedModule. The
root module needs a constructor as it has non-default-constructible
members. The other modules don't.
This is only part of the fix, as in order to avoid having a default
constructor call the VerilatedModule needs to be default constructible.
I think this is now true for modules that do not contain strings or
other non trivially constructible/destructible variables.
Patch 1 of 3 to fix long compile times of the Syms module in some
scenarios.
AstMTaskBody is somewhat redundant and is problematic for #6280. We used
to wrap all MTasks in a CFunc before emit anyway. Now we create that
CFunc when we create the ExecMTask in V3OrderParallel, and subsequently
use the CFunc to represent the contents of the MTask. Final output and
optimizations are the same, but internals are simplified to move
towards #6280.
No functional change.