verilator/test_regress
Thomas Santerre bd6b9161dc
Optimize bit-scan loops into $mostsetbitp1 / $countones (#7822)
Recognize the common single-bit scan loop idioms in V3Unroll (before it
unrolls) and lower them to bit-reduction primitives, replacing a literal
W-iteration loop with one intrinsic-backed expression:

  target=0; for (i=0;i<W;i++) if (vec[i]) target = i + 1;      -> $mostsetbitp1(vec)
  target=0; for (i=0;i<W;i++) if (vec[i]) target = target + 1; -> $countones(vec)

The leading-one form lowers to a new AstMostSetBitP1 node, emitted as
VL_MOSTSETBITP1_{I,Q,W}; those runtime helpers now use __builtin_clz where
available (same pattern as VL_REDXOR's __builtin_parity), with the existing
bit scan as fallback.  The count-ones form reuses AstCountOnes ($countones,
popcount); as the DFG requires a 32-bit countones result it is built at 32
bits and narrowed to the accumulator width with a select.

Matching is structural to stay sound: the index must start at 0, increment
by exactly 1, and scan all W==width(vec) bits via a single 1-bit select of a
distinct vector, with the target pre-zeroed and no else branch.  The loop
bound is accepted as a strict ascending 'idx < W' written either way and
signed or unsigned (Gt/GtS/Lt/LtS).  Gated by -fbit-scan-loops (on at -O).

Adds t_bit_scan_loops (I/Q/W, count-ones and unsigned-index positives;
step-2, start-1, idx*2+1, vec[idx+1], target=idx and W!=width negatives, all
self-checked and asserted via --stats not to lower) plus t_bit_scan_loops_off
for the disable flag.

Motivated by a transformer inference design whose 80-bit leading-one detector
ran every cycle (~37% of runtime); the lowering is worth ~39% there.
2026-06-24 10:43:05 +01:00
..
t Optimize bit-scan loops into $mostsetbitp1 / $countones (#7822) 2026-06-24 10:43:05 +01:00
.gdbinit
.gitignore
AGENTS.md CI: Autoformat markdown files 2026-06-15 17:44:50 -04:00
CMakeLists.txt Remove multi-threaded FST tracing (#7443) 2026-04-19 16:02:12 +01:00
Makefile Test: Remove old Makefile rules 2026-04-13 21:09:09 -04:00
Makefile_obj
driver.py Apply 'make format' 2026-06-02 20:47:02 +00:00
input.vc
input.xsim.vc