Recognize "decoder" case statements (where every case item only assigns
constants to a fixed set of left-hand sides) and replace them with a
single packed constant lookup table indexed by the case expression.
Small tables are materialized inline in the generated code, and are
always optimized. Larger ones are placed in the constant pool and only
optimized if deemed beneficial over branches.
While this slightly conflicts with V3Table, and is not worth that much
on it's own, there will be a follow up patch that converts more cases of
this form which will be much more valuable. This patch does the
necessary analysis and the simple table conversion when possible.
Split -fcase into -fcase-table (this new conversion) and -fcase-tree (the
existing bitwise branch-tree conversion); -fno-case is now an alias for
both.
Default branches, assignments preceding the case (used as default values),
casez wildcards, multiple and partial left-hand sides, and both blocking and
non-blocking assignments are handled. Cases that cannot be safely tabled (e.g.
non-exhaustive with no default, overlapping writes to one variable, or mixed
blocking/non-blocking assignments) fall back to the existing if/else lowering.
Consequently disabled re-inlining of constant pool variables in V3Const,
and rebuild the constant pool hash in V3Dead (previously we didn't
create constant pool entries early enough for this to matter)
When a lot of combinational logic is driven from top level inputs,
work can be wasted evaluating that logic if the top level inputs don't
change.
This change adds an optimization by performing a change detect on the
top level inputs, and evaluate 'ico' logic only if the top level input
actually changed. This especially helps with --hierarchical/--lib-create
which runs the 'ico' of each sub-model in the eval settle loop.
This was observed to yield 40%+ run-time speedup on some partitioned
designs.
The added change detection is cheap, so it is emitted even if the 'ico'
region is small, and is on by default.
The optimization is only sound if the model itself does not write to the
top level inputs (otherwise the 'previous value' variables would be out
of sync, which are not updated by internal writes.). If we can detect a
top level input is written within the design, then for that input, we
fall back on always running the relevant logic. With --vpi we cannot
prove safety statically, so --vpi will disable this optimisation unless
explicitly enabled. (In which case it's the user's responsibility to not
write to top level inputs via the VPI.)
As per discussion. Remove the unsound V3SplitAs pass. The
isolate_assignments attribute/directive is now parsed and ignored in the
frontend for compatibility but otherwise have no effect.
Fixes#7144
This is still mostly refactoring of V3Case, but with functional changes.
Decouple the exhaustiveness/overlap analysis from the decision to
convert the case using the fast bitwise testing method. This enables
dropping the 'notParallel' assertions for those we can prove exhaustive
and unique, even if we decide to convert them using the generic if/else
ladder scheme.
Found by inspection, case inside used to threat 'x' as a value, not as a
wildcard. Per the standard it should behave as '==?' which treats both
'x' and 'z' as wildcards.
V3Gate used to inline too many expensive operations. One particularly
bad example is inlining `{<<{wide}}` (bit-reverse of a wide signal),
which is a single input node, but is quite expensive to compute, which
we always used to inline.
Change the heuristic to only inline single input nodes if they are not
wide, or a cheap wide operation, otherwise treat them the same as
multi-input ops and inline them only if they are used no more than once.
Add a simple Dfg pass that removes redundant bit selects early. This
can significantly cut down on downstream work and remove some temporary
variables introduced during synthesis.