Avoid cloning the module when inlining the last instance that references
that module. This saves a lot of memory because it saves cloning
singleton modules (those with a single instance), which we always
inline. The top few levels of the hierarchy are often simple wrappers,
including the one added by Verilator in V3LinkLevel::wrapTop. Cloning
these and putting off deleting the originals can be very expensive
because they often have a lot of contents inlined into them, so each
layer of wrapper that is inlined would essentially add a whole new clone
of the large top-level. Directly inlining the module for the last cell
without cloning saves us from all this duplicate memory consumption and
also from having to create the clones in the first place.
Also added minor traversal speedups
This reduces the memory consumption of V3Inline by 80% and peak memory
consumption of Verilator by about 66% on a large design, while speeding
up the V3Inline pass by ~3.5x and the whole of Verilator by ~8% while
producing identical output.
- More efficient comparison by pre-computing sorting keys.
- Remove work items in algorithms known to be redundant earlier.
This greatly reduces data structure sizes.
- Use V3GraphVertex->user() for state tracking instead of unordered_map
while both of these are constant time, they do add up.
- In `makeMinSpanningTree`, instead of batch inserting outgoing edges of
each visited vertex into an ordered set, keep an ordered set of sorted
vectors of edges. This reduces the size of the ordered set
significantly (it is now O(V) rather than O(E), and as the subject
graph is a complete graph, V ~ sqrt(E), so this is a significant gain).
- Use a vector + sorting in `perfectMatching` instead of an ordered set.
This is faster on large working sets.
This yields 3.8x speedup on the variable order pass and overall 14%
verilation speed gain on a large design.
Repeatedly traversing whole modules in emit (due to file splitting)
looking for `systemc_* sections can add up to a lot of time on large
designs that have been flattened and need to be split into many files.
Assuming `systemc_* is a rarely used feature, just don't bother if we
don't need to. This gain 9% verilation speed improvement on a large
benchmark.
* Tests: Add t_hier_block_sc_trace(fst|vcd) that tests tracing hierarchical block on SystemC.
* Add a check that elaboration is done before a trace file is opened.
* Add a check that elaboration is done before trace() is called to verilated SystemC model.
* Tests: call sc_core::sc_start(sc_core::SC_ZERO_TIME) before opening a trace file
* Tests: Fix t_trace_two_sc to call sc_start before opening trace
* Use vl_fatal as suggested in PR review.
Trace initialization (tracep->decl* functions) used to explicitly pass
the complete hierarchical names of signals as string constants. This
contains a lot of redundancy (path prefixes), does not scale well with
large designs and resulted in .rodata sections (the string constants) in
ELF executables being extremely large.
This patch changes the API of trace initialization that allows pushing
and popping name prefixes as we walk the hierarchy tree, which are
prepended to declared signal names at run-time during trace
initialization. This in turn allows us to emit repeat path/name
components only once, effectively removing all duplicate path prefixes.
On SweRV EH1 this reduces the .rodata section in a --trace build by 94%.
Additionally, trace declarations are now emitted in lexical order by
hierarchical signal names, and the top level trace initialization
function respects --output-split-ctrace.
* Add a test to reproduce #3197
* Fix#3197. Optimize correctly even if a variable is >32
* Quick exit instead of continue. No functional change is intended.
* Delete comment-out line.
* update per review comment
IEEE 1800-2017 6.11.3 says these types are unsigned. Until now these
types were treated as not having a signedness (NOSIGN), and nodes having
these types were later resolved by V3Width to be unsigned. This is a bit
problematic when creating nodes of these types after V3Width. Treating
these types as unsigned from the get go is fine, and actually improves
generated code slightly.
* Add a test to reproduce bug3182. Run the same HDL with -Oo to confirm the result is same.
* Hopefully fix#3182. The result can be 0 only when polarity is true (no AstNot is found during traversal).