Commit Graph

113 Commits

Author SHA1 Message Date
Thomas Santerre bd6b9161dc
Optimize bit-scan loops into $mostsetbitp1 / $countones (#7822)
Recognize the common single-bit scan loop idioms in V3Unroll (before it
unrolls) and lower them to bit-reduction primitives, replacing a literal
W-iteration loop with one intrinsic-backed expression:

  target=0; for (i=0;i<W;i++) if (vec[i]) target = i + 1;      -> $mostsetbitp1(vec)
  target=0; for (i=0;i<W;i++) if (vec[i]) target = target + 1; -> $countones(vec)

The leading-one form lowers to a new AstMostSetBitP1 node, emitted as
VL_MOSTSETBITP1_{I,Q,W}; those runtime helpers now use __builtin_clz where
available (same pattern as VL_REDXOR's __builtin_parity), with the existing
bit scan as fallback.  The count-ones form reuses AstCountOnes ($countones,
popcount); as the DFG requires a 32-bit countones result it is built at 32
bits and narrowed to the accumulator width with a select.

Matching is structural to stay sound: the index must start at 0, increment
by exactly 1, and scan all W==width(vec) bits via a single 1-bit select of a
distinct vector, with the target pre-zeroed and no else branch.  The loop
bound is accepted as a strict ascending 'idx < W' written either way and
signed or unsigned (Gt/GtS/Lt/LtS).  Gated by -fbit-scan-loops (on at -O).

Adds t_bit_scan_loops (I/Q/W, count-ones and unsigned-index positives;
step-2, start-1, idx*2+1, vec[idx+1], target=idx and W!=width negatives, all
self-checked and asserted via --stats not to lower) plus t_bit_scan_loops_off
for the disable flag.

Motivated by a transformer inference design whose 80-bit leading-one detector
ran every cycle (~37% of runtime); the lowering is worth ~39% there.
2026-06-24 10:43:05 +01:00
Geza Lore a37e2ee94b
Optimize wide decoder case statements into decoder expressions (#7804)
Extend the decoder-pattern case optimization to selectors that are too
wide for a full 2^width lookup table. A decoder-pattern case (where
every case item assigns constants to a fixed set of LHSs) is lowered to
a new AstMachMasked expression. AstMachMasked is emitted as a run-time
VL_MATCHMASKEd_* function call. It contains a packed constant pool table,
'matchp', which is a list of '(mask, bits)' pairs. At runtime, the index of the 
first matching entry is returned, and is used to index a value table. This single
(albeit complicated) expression can replace large if-else trees whole, resulting
in much more compact code with fewer static hard to predict branches. It
is worth about 10% speed and 30% code size in some designs.

Example:

```systemverilog
    logic [39:0] sel;
    always_comb
      casez (sel)
        40'b???????????????????????????????????????1: out = 8'h01;
        40'b??????????????????????????????????????1?: out = 8'h02;
        40'b?????????????????????????????????????1??: out = 8'h03;
        default:                                      out = 8'hff;
      endcase
```

is compiled to:

```c++
    out = TABLE_value[VL_MATCHMASKED_Q(sel, CONST_match)];
```

Where 'CONST_match' contains 4 entries, of a 40-bit mask and 40-bit bit
pattern each, and 'TABLE_value' contains 4 entries of the corresponding
8-bit results. (Entries are aligned to word boundaries to avoid runtime
bit swizzling)
2026-06-19 19:46:13 +01:00
Wilson Snyder df78db0e73 Fix $fflush and autoflush with --threads (#7782).
Fixes #7782.
2026-06-14 13:36:49 -04:00
Geza Lore c878a7e735 Optimize VL_ONEHOT
Equivalent to `__builtin_popcount(_) == 1` and a few instructions
shorter.
2026-06-03 11:24:34 +01:00
Geza Lore 7b45b3ee8a Optimize string formatting runtime functions
Add an overload that takes `const char*` format string. This avoids
having to construct/destruct a temporary `std::string` at the call site
when calling these functions with a string literal, which is fairly
common. This is worth 25% code size on some assertion heavy design.

Also use `std::string::clear()` instead of assigning an empty string
which is more efficient.
2026-06-03 11:18:38 +01:00
Geza Lore efb83c55de
Optimize VL_SFORMAT when result is string (#7702)
Omit unused bit width parameter. Emit rule is simple enough to change
and the unused parameter just bloats code size.
2026-06-03 08:42:48 +01:00
Paul Swirhun 1eb12685a7
Support streaming into 2-D unpacked arrays (#7686) (#7687)
Fixes #7686.
2026-05-30 22:08:32 -04:00
Geza Lore bd4bd82d4b Optimize primitive runtime functions
These were a victim of recent refactor but are actually important
(#4733). It also helps in debug builds not to use -O0 memcpy/memset.
2026-05-24 11:39:39 +01:00
Geza Lore c99aa8ede5
Fix erroneous implicit conversions of VlWide (#7642)
Change WDataInP/WDataOutP to be opaque handles types instead of aliases
to raw pointers. This subsequently eliminates needing an implicit cast
operator in VlWide, which is replaced with implicit constructors of
WDataInP/WDataOutP that can create a handle from a VlWide. This
eliminates some unsafe conversions that the previous implicit cast
operator unintentionally enabled (e.g. #7618). It also eliminates
having to insert ".data()" in various places int he generated code, which
simplifies internals (the only place ".data()" should be needed is in
calls to variadic functions where the expected type of the argument is
not WDataInP/WDataOutP).

The handles otherwise behave like pointers, implementing the minimal
amount of operators required to code the runtime. The handle is still
only a single pointer, and will be passed in registers as before, so
this patch should be performance neutral.

As part of this removed WData, which used to be an alias for EData.
All uses are now either EData*, WDataInP, WDataOutP, or VlWide directly.
2026-05-22 20:05:08 +01:00
Benjamin Collier 69b3c5f6d1
Support streaming on queues (#7597) 2026-05-20 19:14:02 -04:00
Wilson Snyder 31757df229
Internals: clangtidy cleanups. No functional change intended (#7343) 2026-03-27 23:14:18 -04:00
Yilou Wang 998ec5b1d7
Fix streaming with descending unpacked arrays and unpacked-to-queue (#7287) 2026-03-20 09:51:35 -04:00
Wilson Snyder 602ee384de
Support $sformat with runtime format string (#7212). (#7257)
Fixes #7212.
2026-03-14 22:43:56 -04:00
Wilson Snyder d40036239b Fix wide conditional short circuiting (#7155).
Fixes #7155.
2026-02-28 09:40:10 -05:00
Wilson Snyder 0051d91555 Fix too-short bit pack returning wrong value (#7111).
Fixes #7111.
2026-02-20 22:25:02 -05:00
Geza Lore 5e5dcdbdbd
Optimize right shifts as clean (#6981) 2026-02-01 08:12:18 -05:00
Wilson Snyder 7c6c6a684b Add SPDX copyright identifiers, and get 'reuse' clean. No functional change. 2026-01-26 20:24:34 -05:00
Wilson Snyder 13327fa9c0 Copyright year update. 2026-01-01 07:22:09 -05:00
Jakub Wasilewski 0b8c369740
Add `sc_biguint` pragma (#6712) 2025-11-20 17:08:59 -05:00
Geza Lore 922223a9c3
Internals: Replace VlTriggerVec with unpacked array (#6616)
Removed the VlTriggerVec type, and refactored to use an unpacked array
of 64-bit words instead. This means the trigger vector and its
operations are now the same as for any other unpacked array. The few
special functions required for operating on a trigger vector are now
generated in V3SchedTrigger as regular AstCFunc if needed.

No functional change intended, performance should be the same.
2025-10-31 18:29:11 +00:00
Wilson Snyder ac2859bf24
Internals: Upgrade to clang-format-18 (#6333) 2025-08-25 20:47:48 -04:00
Wilson Snyder f71b8e6195 Internals: Fix up include/ cppcheck issues (#6311) 2025-08-19 21:36:52 -04:00
Paul Swirhun bd2cb989d1 Support bit queue streaming (#5830) (#6103). 2025-07-27 15:29:56 -04:00
Wilson Snyder 470f99694e Revert d8dbb08a: Support bit queue streaming (#5830) (#6103) 2025-07-26 17:59:52 -04:00
Paul Swirhun d8dbb08a95
Support bit queue streaming (#5830) (#6103) 2025-07-26 16:53:51 -04:00
George Polack f1826a7c20
Support Verilog real to SystemC double (#6136) (#6158) 2025-07-25 20:05:36 +02:00
Wilson Snyder 6af694b04b Support `$timeformat` with missing arguments (#6113). 2025-06-24 17:30:05 -04:00
Paul Swirhun ac06b6fc4f
Fix streaming operator packing order (#5903) (#6077) 2025-06-10 17:23:16 -04:00
Todd Strader d9534ec626
Fix x assign vs init randomization (#6075) 2025-06-09 17:59:01 -04:00
Todd Strader 4b041c636f
Fix --x-initial and --x-assign random stability (#2662) (#5958) (#6018) (#6025) 2025-05-27 09:31:55 -04:00
Wilson Snyder af30436357 Internals: Rename VL_PACK/VL_UNPACK in prep for future fix. No functional change intended. 2025-05-22 06:54:16 -04:00
Wilson Snyder 640339ec36 Revert 'Fix --x-initial and --x-assign random stability (#2662) (#5958).' See (#6018).
Reverts 4581023805 plus line in Changes file
2025-05-17 20:27:03 -04:00
Todd Strader 4581023805
Fix --x-initial and --x-assign random stability (#2662) (#5958) 2025-05-16 13:54:51 -04:00
github action e3e8f18a4e Apply 'make format' 2025-04-15 01:41:13 +00:00
Brian Li 50d7f2afc6
Support assigning unpacked arrays to queues (#5924) (#5928) 2025-04-14 21:40:17 -04:00
Wilson Snyder 8fbb725f34 Copyright year update. 2025-01-01 08:30:25 -05:00
Wilson Snyder 0c820c3068 Internals: Standardize template argument names. No functional change. 2024-11-29 20:20:38 -05:00
Wilson Snyder 0fe8c73d19 Fix `$fatal` to not be affected by `+verilator+error+limit` (#5135). 2024-09-13 20:45:44 -04:00
Krzysztof Bieganski 088862d449
Support appending to queue via `[]` (#5421) 2024-09-02 09:45:47 -04:00
Arkadiusz Kozdra e6fe367bdb
Support streams to/from arrays of wide data (#5334) 2024-08-06 16:18:16 +01:00
Arkadiusz Kozdra a32b8d80f9
Support streaming operator on arrays and wide data (#5326)
Signed-off-by: Arkadiusz Kozdra <akozdra@antmicro.com>
2024-08-06 08:48:46 -04:00
Andrew Nolte 60f9e21d8c
Fix `--x-assign` to be independent from `+verilator+rand+reset` (#5214) 2024-07-14 17:04:00 -04:00
Wilson Snyder ad2862ce3f Fix DPI import of null C-string (#5179). 2024-06-14 22:50:54 -04:00
Bartłomiej Chmiel 13114f2efe
Fix SystemC BITS_PER_DIGIT in VL_ASSIGN_SBW (#5170)
The BITS_PER_DIGIT macro value differs between SystemC versions,
thus referencing it directly is required.
2024-06-10 20:45:49 -04:00
Arkadiusz Kozdra 739be2f782
Support constrained randomization with external solvers (#4947) 2024-05-17 10:38:34 -04:00
Wilson Snyder 8e44487354 Tests: update style 2024-04-13 08:02:25 -04:00
Wilson Snyder 1012c054e6 Fix `$system` with string argument (#5042). 2024-04-11 17:31:14 -04:00
Fuad Ismail 1c79df8630
Support stream operation on unpacked array (#4714) (#5006) 2024-03-21 18:26:42 -04:00
Wilson Snyder ba47da6587 Internals: Move vl_timescaled_double to verilated_funcs.h. No functional change intended 2024-03-10 22:34:32 -04:00
Kefa Chen 5f1dc73a1b
Support public packed struct / union (#860) (#4878) 2024-03-03 10:23:04 -05:00