Note this might miss some cases where a sub-tree within an And/Or/Xor
tree is optimizeable, but not the whole tree, but in practice this seems
to work better than the alternative of keeping a set of failed nodes and
bail early.
Now that we have an efficient algorithm to analyse which bits in a
combinational cycle are not dependent on the cycle, can simplify the
cycle fixup algorithms. Remove FixUpSelDrivers: this was a heuristic
to save on the expensive independent bits analysis, but itself can
cause a performance problem on certain inputs that result in a large
number of attempted fixups. Doing this simplifies the driver tracing
algorithm, and because we now only attempt to trace drivers that are
known to be independent of the cycles, it should always succeed...
Unless of course there is a mismatch between the independent bit
analysis ant the driver tracing algorithm. In such case (when we managed
to prove independence, but then fail to trace a driver), we will crash,
which is still easier to sv-bugpoint than a performance bug.
Fixes#6744
Forceable/externally written variables cannot be used as the canonical
result variable for a Dfg value as the variables value can be
inconsistent with its Dfg drivers (e.g. when forced).
This removes a factor N from DfgBreakCycles, by doing the necessary data
flow analysis for the entire graph up front, and resulting the result for
all subsequent cycle fixups in the current iteration.
Fixes#6731
Combined the 3 various APIs used in EmitC* passes to handle file
opening/splitting into a single one. This removes a lot of copy paste
and makes everything consistent.
All C++ file handling goes through `EmitCBaseVisitor` using the
`openNewOutputHeaderFile`, `openNewOutputSourceFile` and
`closOutputFile` methods.
To emit a new kind of file, always derive a new class from
`EmitCBaseVisitor`, and use the above APIs, they will take care of
everything else in a consistent matter.
Subsequently also removed V3OutSCFile, and instead included
verilated_sc.h (which included the systemc header itself) in the two
files that need it (the primary model header, and the root module
header).
Functional changes:
- The PCH header did not use to have a corresponding AstCFile. Now it
does, though this makes no difference in the output
- All 'slow' sources now have '__Slow' in the name automatically (the
only one missing was for the ConstPool files)
Rest of the output is identical except for the header line now being
present in all generated C++ files.
The Syms class can contain a very large number of VeriltedScope
instances if `--vpi` is used, all of which need a call to the default
constructor in the constructor of the Syms class. This can lead to very
long compilation times, even without optimization on some compilers.
To avoid the constructor calls, hold VeriltedScope via pointers in the
Syms class, and explicitly new and delete them in the Syms
constructor/destructor. These explicit new/delte can then be
automatically split up into sub functions when the Syms
constructor/destructor become large.
Regarding run-time performance, this should have no significant effect,
most interactions are either during construction/destruction of the Syms
object, or are via pointers already. The one place where we used to
refer to VerilatedScope instances is when emitting an AstScopeName for
things like $display %m. For those there will be an extra load
instruction at run-time, which should not make a big difference.
Patch 3 of 3 to fix long compile times of the Syms module in some
scenarios.
Splitting of the Syms constructor/destructor were a bit arbitrarily
enforced with some parts splitable, while others not. There was also an
issue that even if the constructor and destructor bodies were split, we
would still end up with both in the same file that was double the size of
the intended split limit.
To fix, first all statements required in the Syms constructor and
destructor are gathered into a vector, then if the total number of
statements required for both is bigger than the split limit, the
implementations are split into sub-functions, one per file, as before,
ensuring that none of the functions are bigger than the split limit.
Also add __Slow suffix to the names of the files.
Patch 2 of 3 to fix long compile times of the Syms module in some
scenarios.
In order to avoid long compile times of the Syms constructor due to
having a very large number of member constructor sto call, move to using
explicit ctor/dtor functions for all but the root VerilatedModule. The
root module needs a constructor as it has non-default-constructible
members. The other modules don't.
This is only part of the fix, as in order to avoid having a default
constructor call the VerilatedModule needs to be default constructible.
I think this is now true for modules that do not contain strings or
other non trivially constructible/destructible variables.
Patch 1 of 3 to fix long compile times of the Syms module in some
scenarios.
For handling $past and similar functions, we used to collect sampled
values of variables at the beginning of the main _eval function. If we
have many of these, this can grow _eval very large which can make C++
compilation very slow. Apply usual fix of emitting the necessary code in
a separate function and then splitting it based on size.