verilator/include
Thomas Santerre bd6b9161dc
Optimize bit-scan loops into $mostsetbitp1 / $countones (#7822)
Recognize the common single-bit scan loop idioms in V3Unroll (before it
unrolls) and lower them to bit-reduction primitives, replacing a literal
W-iteration loop with one intrinsic-backed expression:

  target=0; for (i=0;i<W;i++) if (vec[i]) target = i + 1;      -> $mostsetbitp1(vec)
  target=0; for (i=0;i<W;i++) if (vec[i]) target = target + 1; -> $countones(vec)

The leading-one form lowers to a new AstMostSetBitP1 node, emitted as
VL_MOSTSETBITP1_{I,Q,W}; those runtime helpers now use __builtin_clz where
available (same pattern as VL_REDXOR's __builtin_parity), with the existing
bit scan as fallback.  The count-ones form reuses AstCountOnes ($countones,
popcount); as the DFG requires a 32-bit countones result it is built at 32
bits and narrowed to the accumulator width with a select.

Matching is structural to stay sound: the index must start at 0, increment
by exactly 1, and scan all W==width(vec) bits via a single 1-bit select of a
distinct vector, with the target pre-zeroed and no else branch.  The loop
bound is accepted as a strict ascending 'idx < W' written either way and
signed or unsigned (Gt/GtS/Lt/LtS).  Gated by -fbit-scan-loops (on at -O).

Adds t_bit_scan_loops (I/Q/W, count-ones and unsigned-index positives;
step-2, start-1, idx*2+1, vec[idx+1], target=idx and W!=width negatives, all
self-checked and asserted via --stats not to lower) plus t_bit_scan_loops_off
for the disable flag.

Motivated by a transformer inference design whose 80-bit leading-one detector
ran every cycle (~37% of runtime); the lowering is worth ~39% there.
2026-06-24 10:43:05 +01:00
..
fstcpp Update fst from upstream (#6771 partial) 2026-06-22 17:25:59 -04:00
vltstd Add SPDX copyright identifiers, and get 'reuse' clean. No functional change. 2026-01-26 20:24:34 -05:00
.gitignore Support VPI product info, warning calls, etc, bug588. 2013-01-17 21:40:37 -05:00
AGENTS.md CI: Autoformat markdown files 2026-06-15 17:44:50 -04:00
verilated.cpp Support VPI access to unpacked struct members (#7823) 2026-06-23 07:04:51 -04:00
verilated.h Support VPI access to unpacked struct members (#7823) 2026-06-23 07:04:51 -04:00
verilated.mk.in Support new FST writer API (#6871) (#6992) 2026-05-12 07:39:43 -04:00
verilated.v Add SPDX copyright identifiers, and get 'reuse' clean. No functional change. 2026-01-26 20:24:34 -05:00
verilated_config.h.in Add SPDX copyright identifiers, and get 'reuse' clean. No functional change. 2026-01-26 20:24:34 -05:00
verilated_cov.cpp Support native FSM state and arc coverage (#7412) 2026-04-22 15:18:59 -04:00
verilated_cov.h Support native FSM state and arc coverage (#7412) 2026-04-22 15:18:59 -04:00
verilated_cov_key.h Support covergroups, coverpoints, and bins (#784) (#7117) 2026-06-05 09:35:01 -04:00
verilated_cov_model.h Support covergroup runtime model Phase A1 (#7728) 2026-06-12 11:40:48 -04:00
verilated_covergroup.cpp Support covergroup runtime model Phase A1 (#7728) 2026-06-12 11:40:48 -04:00
verilated_covergroup.h Support covergroup runtime model Phase A1 (#7728) 2026-06-12 11:40:48 -04:00
verilated_dpi.cpp Fix erroneous implicit conversions of VlWide (#7642) 2026-05-22 20:05:08 +01:00
verilated_dpi.h Fix erroneous implicit conversions of VlWide (#7642) 2026-05-22 20:05:08 +01:00
verilated_force.h Fix force unpacked bitselect (#7744) (#7745) 2026-06-15 21:57:59 -04:00
verilated_fst_c.cpp Fix erroneous implicit conversions of VlWide (#7642) 2026-05-22 20:05:08 +01:00
verilated_fst_c.h Fix erroneous implicit conversions of VlWide (#7642) 2026-05-22 20:05:08 +01:00
verilated_fst_sc.cpp Add SPDX copyright identifiers, and get 'reuse' clean. No functional change. 2026-01-26 20:24:34 -05:00
verilated_fst_sc.h Add SPDX copyright identifiers, and get 'reuse' clean. No functional change. 2026-01-26 20:24:34 -05:00
verilated_funcs.h Optimize bit-scan loops into $mostsetbitp1 / $countones (#7822) 2026-06-24 10:43:05 +01:00
verilated_imp.h Internals: clangtidy cleanups. No functional change intended (#7343) 2026-03-27 23:14:18 -04:00
verilated_intrinsics.h Add SPDX copyright identifiers, and get 'reuse' clean. No functional change. 2026-01-26 20:24:34 -05:00
verilated_probdist.cpp Internals: clangtidy cleanups. No functional change intended (#7343) 2026-03-27 23:14:18 -04:00
verilated_profiler.cpp Internals: clangtidy cleanups. No functional change intended (#7343) 2026-03-27 23:14:18 -04:00
verilated_profiler.h Add SPDX copyright identifiers, and get 'reuse' clean. No functional change. 2026-01-26 20:24:34 -05:00
verilated_random.cpp Fix biased bit distribution under value < (1 << N) constraints (#7563) (#7684) 2026-05-30 13:00:35 -04:00
verilated_random.h Fix biased bit distribution under value < (1 << N) constraints (#7563) (#7684) 2026-05-30 13:00:35 -04:00
verilated_saif_c.cpp Fix erroneous implicit conversions of VlWide (#7642) 2026-05-22 20:05:08 +01:00
verilated_saif_c.h Fix erroneous implicit conversions of VlWide (#7642) 2026-05-22 20:05:08 +01:00
verilated_saif_sc.h Add SPDX copyright identifiers, and get 'reuse' clean. No functional change. 2026-01-26 20:24:34 -05:00
verilated_save.cpp Internals: clangtidy cleanups. No functional change intended (#7343) 2026-03-27 23:14:18 -04:00
verilated_save.h Add SPDX copyright identifiers, and get 'reuse' clean. No functional change. 2026-01-26 20:24:34 -05:00
verilated_sc.h Add SPDX copyright identifiers, and get 'reuse' clean. No functional change. 2026-01-26 20:24:34 -05:00
verilated_sc_trace.h Add SPDX copyright identifiers, and get 'reuse' clean. No functional change. 2026-01-26 20:24:34 -05:00
verilated_std.sv IEEE-compliant, fair `std::semaphore` (#7435) (#7605) 2026-05-18 11:11:42 +02:00
verilated_std_waiver.vlt Add SPDX copyright identifiers, and get 'reuse' clean. No functional change. 2026-01-26 20:24:34 -05:00
verilated_sym_props.h Support VPI access to unpacked struct members (#7823) 2026-06-23 07:04:51 -04:00
verilated_syms.h Add SPDX copyright identifiers, and get 'reuse' clean. No functional change. 2026-01-26 20:24:34 -05:00
verilated_threads.cpp Fix cpu pinning when no 'core id' present in /proc/cpuinfo (#7599) 2026-05-15 14:45:39 -04:00
verilated_threads.h Internals: clangtidy cleanups. No functional change intended (#7343) 2026-03-27 23:14:18 -04:00
verilated_timing.cpp Support per-process RNG for process::srandom() and object seeding (#7408) (#7415) 2026-04-13 13:58:53 -04:00
verilated_timing.h Fix `$finish` to immediately stop executing code from non-final blocks (#7213 partial) (#7390). 2026-04-09 17:49:57 -04:00
verilated_trace.h Fix erroneous implicit conversions of VlWide (#7642) 2026-05-22 20:05:08 +01:00
verilated_trace_imp.h Fix erroneous implicit conversions of VlWide (#7642) 2026-05-22 20:05:08 +01:00
verilated_types.h Support assoc array methods with wide value types (#7680) 2026-06-10 09:39:43 -04:00
verilated_vcd_c.cpp Fix erroneous implicit conversions of VlWide (#7642) 2026-05-22 20:05:08 +01:00
verilated_vcd_c.h Fix erroneous implicit conversions of VlWide (#7642) 2026-05-22 20:05:08 +01:00
verilated_vcd_sc.cpp Add SPDX copyright identifiers, and get 'reuse' clean. No functional change. 2026-01-26 20:24:34 -05:00
verilated_vcd_sc.h Add SPDX copyright identifiers, and get 'reuse' clean. No functional change. 2026-01-26 20:24:34 -05:00
verilated_vpi.cpp Support VPI access to unpacked struct members (#7823) 2026-06-23 07:04:51 -04:00
verilated_vpi.h Add SPDX copyright identifiers, and get 'reuse' clean. No functional change. 2026-01-26 20:24:34 -05:00
verilatedos.h Support printing enum names for %p and %s (#5523) (#7338 repair) (#7521) (#7527) 2026-06-03 14:55:00 -04:00
verilatedos_c.h Support TERMUX (#7559). [Laurent CHARRIER] 2026-05-10 08:20:32 -04:00