Somewhat commonly, there is code out there that compares an expression (or
variable) against many different constants, e.g. a one-hot decoder:
```systemverilog
assign oneHot = {x == 3, x == 2, x == 1, x == 0};
```
If the width of the expression is sufficiently large, this can blow up
a GCC pass and take an egregious amount of memory and time to compile.
Adding a new DFG pass that will generate a cheap one-hot decoder:
to compute:
```systemverilog
wire [$bits(x)-1:0] idx = <the expression being compared many times>
reg tab [1<<$bits(x)] = '{default: 0};
reg [$bits(x)-1:0] pre = '0;
always_comb begin
tab[pre] = 0;
tab[idx] = 1;
pre = idx ; // This assignment marked to avoid a false UNOPFTLAT
end
```
We then replace the comparisons `x == CONST` with `tab[CONST]`.
This is generally performance neutral, but avoids the compile time and memory
blowup with GCC (128GB+ -> 1GB in one example).
We do not apply this if the comparisons seem to be part of a `COMPARE ?
val : COND` conditional tree, which the C++ compilers can turn into jump
tables.
This enables all XiangShan configurations from RTLMeter to now build with GCC,
so in this patch we enabled those in the nightly runs.
To run scheduled instances of the RTLMeter or coverage workflows, the
ENABLE_SCHEDULED_JOBS variable must explicitly be set to 'true' in the
repository settings. This enables each fork to decide whether to run the
scheduled instances or not.
Add the GitHub Actions workflows for running RTLMeter.
Runs start daily, at 02:00 UTC, on ubuntu-24.04. There are 2 runs:
- Using GCC, with default verilator options
- Using Clang, with "--threads 4"
Each run uses a maximum of 2 runners in parallel (so max 4 in total),
and takes slightly over 2 hours to complete.
The jobs will fail if a benchmark is broken, so this already serves as a
regression test for the included designs.
For now, performance metrics are recorded as artefacts of the run but
not otherwise published.
Performance metrics are always recorded for all successful jobs, even if
some cases are failing.