When a lot of combinational logic is driven from top level inputs,
work can be wasted evaluating that logic if the top level inputs don't
change.
This change adds an optimization by performing a change detect on the
top level inputs, and evaluate 'ico' logic only if the top level input
actually changed. This especially helps with --hierarchical/--lib-create
which runs the 'ico' of each sub-model in the eval settle loop.
This was observed to yield 40%+ run-time speedup on some partitioned
designs.
The added change detection is cheap, so it is emitted even if the 'ico'
region is small, and is on by default.
The optimization is only sound if the model itself does not write to the
top level inputs (otherwise the 'previous value' variables would be out
of sync, which are not updated by internal writes.). If we can detect a
top level input is written within the design, then for that input, we
fall back on always running the relevant logic. With --vpi we cannot
prove safety statically, so --vpi will disable this optimisation unless
explicitly enabled. (In which case it's the user's responsibility to not
write to top level inputs via the VPI.)
As per discussion. Remove the unsound V3SplitAs pass. The
isolate_assignments attribute/directive is now parsed and ignored in the
frontend for compatibility but otherwise have no effect.
Fixes#7144
This is still mostly refactoring of V3Case, but with functional changes.
Decouple the exhaustiveness/overlap analysis from the decision to
convert the case using the fast bitwise testing method. This enables
dropping the 'notParallel' assertions for those we can prove exhaustive
and unique, even if we decide to convert them using the generic if/else
ladder scheme.
Found by inspection, case inside used to threat 'x' as a value, not as a
wildcard. Per the standard it should behave as '==?' which treats both
'x' and 'z' as wildcards.
V3Gate used to inline too many expensive operations. One particularly
bad example is inlining `{<<{wide}}` (bit-reverse of a wide signal),
which is a single input node, but is quite expensive to compute, which
we always used to inline.
Change the heuristic to only inline single input nodes if they are not
wide, or a cheap wide operation, otherwise treat them the same as
multi-input ops and inline them only if they are used no more than once.
Add a simple Dfg pass that removes redundant bit selects early. This
can significantly cut down on downstream work and remove some temporary
variables introduced during synthesis.
Replace 3 DfgCond patterns with 2 more general ones that convert DfgCond
with common MSBs/LSBs in both branches into a DfgConcat with a narrower
DfgCond. This pattern arises frequently with Dfg synthesis.