Commit Graph

227 Commits

Author SHA1 Message Date
Geza Lore eafe9636cf
Internals: Dump Ast expression pattern statistics like Dfg (#7818)
Remove the expression combination counts from the default stats file,
and add a new `--dump-ast-patterns` option, which will dump new
`*_ast_patterns_*.txt` files. These contain the expression combinations
in a similar S-expression format as Dfg already produces with
`--dump-dfg-stats`. These dumps are not produced by just `--stats` as
they are fairly expensive to compute. Currently the new option will dump
at two points: just before we change to C types via widthMin usage, and
just before emit.
2026-06-21 22:17:36 +01:00
Geza Lore bcaa110f60
Optimize generated function inlining (#7811)
Previously V3InlineCFuncs inlined call sites but never deleted the now
dead callees. Also missed a lot of opportunities due to evaluation order.

Rewrite using a graph based algorithm, using only a single traversal of
the netlist. This is clearer, more accurate, and faster at compile time.

Also add a clean -fno-inline-cfuncs disable. Setting the limits to 0
still disables inlining, except of empty functions, which can be inlined
with 0 limits (they are no ops). It will also prune unused functions
without -fno-inline-cfuncs.

Pass now also respects `--output-split`
2026-06-21 18:31:56 +01:00
Geza Lore a37e2ee94b
Optimize wide decoder case statements into decoder expressions (#7804)
Extend the decoder-pattern case optimization to selectors that are too
wide for a full 2^width lookup table. A decoder-pattern case (where
every case item assigns constants to a fixed set of LHSs) is lowered to
a new AstMachMasked expression. AstMachMasked is emitted as a run-time
VL_MATCHMASKEd_* function call. It contains a packed constant pool table,
'matchp', which is a list of '(mask, bits)' pairs. At runtime, the index of the 
first matching entry is returned, and is used to index a value table. This single
(albeit complicated) expression can replace large if-else trees whole, resulting
in much more compact code with fewer static hard to predict branches. It
is worth about 10% speed and 30% code size in some designs.

Example:

```systemverilog
    logic [39:0] sel;
    always_comb
      casez (sel)
        40'b???????????????????????????????????????1: out = 8'h01;
        40'b??????????????????????????????????????1?: out = 8'h02;
        40'b?????????????????????????????????????1??: out = 8'h03;
        default:                                      out = 8'hff;
      endcase
```

is compiled to:

```c++
    out = TABLE_value[VL_MATCHMASKED_Q(sel, CONST_match)];
```

Where 'CONST_match' contains 4 entries, of a 40-bit mask and 40-bit bit
pattern each, and 'TABLE_value' contains 4 entries of the corresponding
8-bit results. (Entries are aligned to word boundaries to avoid runtime
bit swizzling)
2026-06-19 19:46:13 +01:00
Wilson Snyder 749b93e405 Commentary: Use standard multiline rst comments, other cleanups 2026-06-18 21:58:01 -04:00
Geza Lore 5712f9b614
Optimize decoder case statements into lookup tables (#7795)
Recognize "decoder" case statements (where every case item only assigns
constants to a fixed set of left-hand sides) and replace them with a
single packed constant lookup table indexed by the case expression.
Small tables are materialized inline in the generated code, and are
always optimized. Larger ones are placed in the constant pool and only
optimized if deemed beneficial over branches.

While this slightly conflicts with V3Table, and is not worth that much
on it's own, there will be a follow up patch that converts more cases of
this form which will be much more valuable. This patch does the
necessary analysis and the simple table conversion when possible.

Split -fcase into -fcase-table (this new conversion) and -fcase-tree (the
existing bitwise branch-tree conversion); -fno-case is now an alias for
both.

Default branches, assignments preceding the case (used as default values),
casez wildcards, multiple and partial left-hand sides, and both blocking and
non-blocking assignments are handled. Cases that cannot be safely tabled (e.g.
non-exhaustive with no default, overlapping writes to one variable, or mixed
blocking/non-blocking assignments) fall back to the existing if/else lowering.

Consequently disabled re-inlining of constant pool variables in V3Const,
and rebuild the constant pool hash in V3Dead (previously we didn't
create constant pool entries early enough for this to matter)
2026-06-18 09:30:50 +01:00
Wilson Snyder c86816476c Commentary: Changes update 2026-06-15 17:37:49 -04:00
Geza Lore 5ab2bf1ec4
Optimize input combinational logic by change detection (#7784)
When a lot of combinational logic is driven from top level inputs,
work can be wasted evaluating that logic if the top level inputs don't
change.

This change adds an optimization by performing a change detect on the
top level inputs, and evaluate 'ico' logic only if the top level input
actually changed. This especially helps with --hierarchical/--lib-create
which runs the 'ico' of each sub-model in the eval settle loop.

This was observed to yield 40%+ run-time speedup on some partitioned
designs.

The added change detection is cheap, so it is emitted even if the 'ico'
region is small, and is on by default.

The optimization is only sound if the model itself does not write to the
top level inputs (otherwise the 'previous value' variables would be out
of sync, which are not updated by internal writes.). If we can detect a
top level input is written within the design, then for that input, we
fall back on always running the relevant logic. With --vpi we cannot
prove safety statically, so --vpi will disable this optimisation unless
explicitly enabled. (In which case it's the user's responsibility to not
write to top level inputs via the VPI.)
2026-06-15 05:42:00 +01:00
Wilson Snyder 816ab67826 Commentary: Changes update 2026-06-05 18:36:55 -04:00
Yogish Sekhar cf8713aebc
Add `--coverage-per-instance` 2026-05-24 18:08:55 -04:00
Yilou Wang 00c9e58006
Fix internal error on consecutive repetition with N > 256 (#7552) (#7603) 2026-05-17 21:54:10 -04:00
Igor Zaworski 25d4827bd5
Internals: Four state pre-pull (types) (#7520) 2026-04-30 16:56:15 -04:00
Yogish Sekhar a680919edc
Support native FSM state and arc coverage (#7412) 2026-04-22 15:18:59 -04:00
Geza Lore 2b9d006097
Change Dfg pattern dumps to use --dump-dfg-patterns (#7455)
Dumping Dfg patterns can take a non-trivial amount of time, so do it
only with --dump-dfg-patterns, instead of with --stats.
Also further improve dumping format.
2026-04-21 12:07:19 +01:00
Geza Lore 97454a1bc5
Remove multi-threaded FST tracing (#7443)
Remove parallel (using the FST library writer thread) and offloaded
(separate Verilator internal thread) tracing (only used by FST). These
are not compatible with #6992, and #5806 should yield better performance
in all cases.

Consequently mark '--trace-threads' and '--trace-fst-thread' options as
deprecated
2026-04-19 16:02:12 +01:00
Geza Lore 9f9532ff78
Optimize Dfg only once, after V3Scope (#7362) 2026-04-09 08:31:12 -04:00
Wilson Snyder 947cbaf330 Deprecate `--structs-packed` (#7222). 2026-03-21 10:59:27 -04:00
Wilson Snyder 3097df46fa Change `--converge-limit` default to 10000 (#7209).
Fixes #7209.
2026-03-07 09:05:37 -05:00
Rahul Behl 9a5c1d27c8
Support array reduction methods with 'with' clause in constraints (#6455) (#6999) 2026-03-04 12:01:35 -05:00
jalcim 7cf539cf05
Add --func-recursion-depth CLI option (#7175) (#7179) 2026-03-04 06:46:07 -05:00
Geza Lore 098fe96643
Add V3LiftExpr pass to lower impure expressions and calls (#7141)
Introduce new pass that converts impure expressions, or those with
function and method calls into simple assignment statements. Please see
the blurb at the top of the file why this is useful and how it works.
In particular currently it enables more Dfg optimization as functions
will be inlined without AstExprStmt.

Ideally we should enforce this lowering is applied to every procedural
statement (there are still a handful of exceptions). With that, long
term with this pass + #6820, there should be no need to ever use an
AstExprStmt past this new lowering pass, which should enable more easier
optimization down the line.

Also ideally this should be run earlier. Currently it's after V3Tristate
as that calls pinReconnectSimple so we don't have to touch Cell ports.

Currently disabled when code coverage is enabled due to #7119.
2026-02-28 22:20:09 +00:00
Wilson Snyder 7dde11b4c6 Docs: Split control.rst from exe_verilator.rst. 2026-02-24 21:11:39 -05:00
Todd Strader 6a5d3b0b72
Add --max-replication option (#7139) 2026-02-23 16:51:37 -05:00
Wilson Snyder 28d04c809f Commentary: Changes update 2026-02-16 05:38:03 -05:00
Geza Lore 505d33b35a
Support #0 delays with IEEE-1800 compliant semantics (#7079)
This patch adds IEEE-1800 compliant scheduling support for the Inactive
scheduling region used for #0 delays.

Implementing this requires that **all** IEEE-1800 active region events
are placed in the internal 'act' section. This has simulation
performance implications. It prevents some optimizations (e.g.
V3LifePost), which reduces single threaded performance. It also reduces
the available work and parallelism in the internal 'nba' section, which
reduced the effectiveness of multi-threading severely.

Performance impact on RTLMeter when using scheduling adjusted to support
proper #0 delays is ~10-20% slowdown in single-threaded mode, and ~100%
(2x slower) with --threads 4.

To avoid paying this performance penalty unconditionally, the scheduling
is only adjusted if either:
1. The input contains a statically known #0 delay
2. The input contains a variable #x delay unknown at compile time

If no #0 is present, but #x variable delays are, a ZERODLY warning is
issued advising the use of '--no-sched-zero-delay' which is a promise
by the user that none of the variable delays will evaluate to a zero
delay at run-time. This warning is turned off if '--sched-zero-delay'
is explicitly given. This is similar to the '--timing' option.

If '--no-sched-zero-delay' was used at compile time, then executing
a zero delay will fail at runtime.

A ZERODLY warning is also issued if a static #0 if found, but the user
specified '--no-sched-zero-delay'. In this case the scheduling is not
adjusted to support #0, so executing it will fail at runtime. Presumably
the user knows it won't be executed.

The intended behaviour with all this is the following:

No #0, no #var in the design (#constant is OK)
-> Same as current behaviour, scheduling not adjusted,
   same code generated as before

Has static #0 and '--no-sched-zero-delay' is NOT given:
-> No warnings, scheduling adjusted so it just works, runs slow

Has static #0 and '--no-sched-zero-delay' is given:
-> ZERODLY on the #0, scheduling not adjusted, fails at runtime if hit

No static #0, but has #var and no option is given:
-> ZERODLY on the #var advising use of '--no-sched-zero-delay' or
   '--sched-zero-delay' (similar to '--timing'), scheduling adjusted
   assuming it can be a zero delay and it just works

No static #0, but has #var and '--no-sched-zero-delay' is given:
-> No warning, scheduling not adjusted, fails at runtime if zero delay

No static #0, but has #var and '--sched-zero-delay' is given:
-> No warning, scheduling adjusted so it just works
2026-02-16 03:55:55 +00:00
Geza Lore 3dd2b762e7
Fix scope tree in traces in hierarchical mode (#7042) 2026-02-12 20:54:03 -05:00
Geza Lore bb0e1c8c61
Optimize temporary insertion for concatenations in Dfg (#7013)
Add a new Dfg pass 'pushDownSel'. This will try to move selects through
a tree of concatenations in order to eliminate temporary nodes holding
intermediate concatenation results. This can get rid of a lot of
variables when packed arrays are assigned in parts (e.g. bit-wise).
2026-02-07 18:06:12 +00:00
Wilson Snyder 7c6c6a684b Add SPDX copyright identifiers, and get 'reuse' clean. No functional change. 2026-01-26 20:24:34 -05:00
Luca Colagrande f9f7a7146d
Comnentsry: Fix `--trace` flag description in docs (#6884) 2026-01-06 07:16:35 -05:00
Wilson Snyder 40cf3c4b16 Remove deprecated `--make cmake`. 2026-01-01 09:27:20 -05:00
Wilson Snyder a7b80966ec Remove `--xml-only`. 2026-01-01 09:23:05 -05:00
Wilson Snyder 13327fa9c0 Copyright year update. 2026-01-01 07:22:09 -05:00
Wilson Snyder 4080284e53
Fix warning lint directive ordering and consistency (#4185) (#5368) (#5610) (#6876). 2025-12-30 20:31:34 -05:00
Wilson Snyder e6114b6bbb Commentary 2025-12-30 08:24:41 -05:00
Iztok Jeras 6a07595a44
Commentary: Text formatting fix (#6863) 2025-12-25 19:01:38 -05:00
Wilson Snyder 1b93033690 Add `--quiet-build` to suppress make/compiler informationals. 2025-12-23 19:21:42 -05:00
Wilson Snyder 921ad64d22 Commentary: Changes update 2025-12-23 19:20:42 -05:00
Wilson Snyder 5dc05e1fa8 Internals: Update some JSON references. No functional change. 2025-12-23 10:13:23 -05:00
Jose Drowne c0a0f0dab9
Optimize inlining small C functions and add `-inline-cfuncs` (#6815) 2025-12-21 13:14:50 -05:00
Wilson Snyder 605915f307 Commentary: Changes update 2025-12-20 22:04:29 -05:00
Geza Lore f990dd747e
Change metacomments to not enable warnings disabled in control file (#6836) (#6842)
Track the location based message/feature enable bits separately for code
and control file directives. A message/feature is disabled if disabled
either in the control file, or in code directives/metacomments. That is,
enabled only if both agree should be enabled.
2025-12-20 06:33:46 -05:00
Wilson Snyder b90865a08a Change `--lint-only` and `--json-only` to imply `--timing` (#6790). 2025-12-17 19:24:43 -05:00
Wilson Snyder f1ee434dca Commentary: Changes update 2025-12-16 20:43:08 -05:00
Dan Ruelas-Petrisko 394d9cf168
Support `-libmap` (#5891 partial) (#6764) 2025-12-16 11:21:46 -05:00
Wilson Snyder 91a59bbcc5 Documentation: Adapt format suggested by docstrfmt 2025-11-22 10:59:38 -05:00
Wilson Snyder 4cc4ff3e07 Commentary: Fix some .rst style issues 2025-11-21 22:25:03 -05:00
Wilson Snyder 7e3cab8e5d Commentary: Changes update 2025-11-21 19:39:51 -05:00
Jakub Wasilewski 0b8c369740
Add `sc_biguint` pragma (#6712) 2025-11-20 17:08:59 -05:00
Geza Lore a1056c6ae9
Add `-param`/`-port` options to `public_flat*` control directives (#6685) 2025-11-13 06:59:02 -05:00
Geza Lore 0dc9f779f8
Add `-fno-inline-funcs-eager` option to disable excessive inlining (#6682) 2025-11-11 21:46:19 +00:00
Wilson Snyder c87a3e92fc Commentary: Changes update 2025-11-09 14:50:31 -05:00