Doing ABC runs in parallel can actually make things slower when every ABC run requires
spawning an ABC subprocess --- especially when using popen(), which on glibc does not
use vfork(). What seems to happen is that constant fork()ing keeps making the main
process data pages copy-on-write, so the main process code that is setting up each ABC
call takes a lot of minor page-faults, slowing it down.
The solution is pretty straightforward although a little tricky to implement.
We just reuse ABC subprocesses. Instead of passing the ABC script name on the command
line, we spawn an ABC REPL and pipe a command into it to source the script. When that's
done we echo an `ABC_DONE` token instead of exiting. Yosys then puts the ABC process
onto a stack which we can pull from the next time we do an ABC run.
For one of our large designs, this is an additional 5x speedup of the primary AbcPass.
It does 5155 ABC runs, all very small; runtime of the AbcPass goes from 760s to 149s
(not very scientific benchmarking but the effect size is large).
Large circuits can run hundreds or thousands of ABCs in a single AbcPass.
For some circuits, some of those ABC runs can run for hundreds of seconds.
Running ABCs in parallel with each other and in parallel with main-thread
processing (reading and writing BLIF files, copying ABC BLIF output into
the design) can give large speedups.
The YosysHQ Verific Extensions are compiled separately using their own
stripped-down version of the Yosys headers. To maintain ABI
compatibility with older extension builds post C++-ification of Yosys's
logging APIs, which are backwards compatible on the API but not ABI
level, this commit adds ABI compatible versions of a subset of the old
logging API used by the extensions.
The ID(OVERFLOW) IdString isn't used widely enough that we require a
statically allocated IdString, but I think it's good to have an example
workaround in place in case more collisions come up.
I've used this shell command to obtain the list:
rg -I -t cpp -t yacc -o \
'ID\((\$?[a-zA-Z0-9_]+)\)|ID::($?[a-zA-Z0-9_]+)' -r 'X($1$2)' \
| LC_ALL=C sort -u
This removed the entries X(_TECHMAP_FAIL_) and X(nomem2init).
The vast majority of ID(...) uses are in a context that is overloaded
for StaticIdString or will cause implicit conversion to an IdString
constant reference. For some sufficently overloaded contexts, implicit
conversion may fail, so it's useful to have a method to force obtaining
a `IdString const &` from an ID(...) use.
When turning all literal IdStrings of the codebase into StaticIdStrings
this was needed in exactly one place, for which this commit adds an
`id_string()` call.
The simple XOR `commutative_eat()` implementation produces a lot of collisions.
https://www.preprints.org/manuscript/201710.0192/v1/download is a useful reference on this topic.
Running the included `hashTest.cc` without the hashlib changes, I get 49,580,349 collisions.
The 49,995,000 (i,j) pairs (0 <= i < 10000, i < j < 10000) hash into only 414,651 unique hash values.
We get simple collisions like (0,1) colliding with (2,3).
With the hashlib changes, we get only 707,099 collisions and 49,287,901 unique hash values.
Much better! The `commutative_hash` implementation corresponds to `Sum(4)` in the paper
mentioned above.
`CellTypes::eval()` is more generic but also more limited. `ConstEval::eval()` requires more setup (both in code and at runtime) but has more complete support.
Still unsupported:
- wide muxes (`$_MUX16_` and friends)
Partially supported types have comments in `test_cell.cc`.
Fix `CellTypes::eval() for `$_NMUX_`.
Fix `RTLIL::Cell::fixup_parameters()` for $concat, $bwmux and $bweqx.
Conditionally include help source tracking to preserve ABI.
Docs builds can (and should) use `ENABLE_HELP_SOURCE` so that the generated sphinx docs can perform default grouping and link to source files.
Regular user-builds don't need the source tracking.
`_content` vector owns elements so that when the `ContentListing` is deleted so is the content.
Remove `get_content()` method in favour of `begin()` and `end()` const iterators.
More `const` in general, and iterations over `ContentListing` use `&content` to avoid copying data.
dict is pretty slow when you don't ever need to iterate the container in
order. And the hashfunction for char* in dict hashes for every single
byte in the string, likely doing significantly more work than std::hash.
Checking only happens at compile time if -std=c++20 (or greater) is enabled. Otherwise
the checking happens at run time.
This requires the format string to be a compile-time constant (when compiling with
C++20), so fix a few places where that isn't true.
The format string behavior is a bit more lenient than C printf. For %d/%u
you can pass any integer type and it will be converted and output without
truncating bits, i.e. any length specifier is ignored and the conversion is
always treated as 'll'. Any truncation needs to be done by casting the argument itself.
For %f/%g you can pass anything that converts to double, including integers.
Performance results with clang 19 -O3 on Linux:
```
hyperfine './yosys -dp "read_rtlil /usr/local/google/home/rocallahan/Downloads/jpeg.synth.il; dump"'
```
C++17 before: Time (mean ± σ): 101.3 ms ± 0.8 ms [User: 85.6 ms, System: 15.6 ms]
C++17 after: Time (mean ± σ): 98.4 ms ± 1.2 ms [User: 82.1 ms, System: 16.1 ms]
C++20 before: Time (mean ± σ): 100.9 ms ± 1.1 ms [User: 87.0 ms, System: 13.8 ms]
C++20 after: Time (mean ± σ): 97.8 ms ± 1.4 ms [User: 83.1 ms, System: 14.7 ms]
The generated code is reasonably efficient. E.g. with clang 19, `stringf()` with a format
with no %% escapes and no other parameters (a weirdly common case) often compiles to a fully
inlined `std::string` construction. In general the format string parsing is often (not always)
compiled away.
If you have a large design with a lot of modules and you use the Verilog
backend to emit modules one at a time to separate files, performance is
very low. The problem is that the Verilog backend calls `design->sort()`
every time, which sorts the contents of all modules, and this is slow
even when everything is already sorted.
We can easily fix this by only sorting the contents of modules that
we're actually going to emit.
Linking to optiongroups doesn't add *that* much, and is kind of a pain; meanwhile having the optiongroups adds an extra level of indentation.
Instead of options needing to be in an option group, they instead go in either the root node or nested in a usage node. Putting them in a usage node allows for more-or-less the previous behaviour but without making it the default.
Or at least all of the synth commands, because they have the same preceding text.
Except `synth_fabulous`, because that has unconditional `read_verilog`s at the start. Which seems like a bad idea and and not compatible with `-run`?
Add `options` map for setting `ContentListing` options with key:val pairs; currently only used with `ContentListing::codeblock()` language arg.
Fix generated codeblock rst to print each line of code with indentation, and change it to use explicit an `code-block` so we can set a language on it.
Keep techlibs folder hierarchy.
techlibs/* and passes/* groups are now nested under index_techlibs and index_passes respectively, with most (all?) of the passes/* pages getting proper headings, as well as backends/frontends/kernel. `index_passes_techmap` also references `index_techlibs`.
Split command reference toc in twain, one with maxdepth=2 and one with maxdepth=3, since passes and techlibs now have an extra level of nesting.
Move the `cmd_ref` link to the command reference, instead of top of the page.
Remove `index_internal` and `index_other` from the toc, and mark the pages as orphan. Internal commands get a note callout after the command reference toc (although this doesn't work for the pdf build), while other commands are linked in the warning for missing `source_location` (since that *should* be the only time when there are any commands in the "unknown" group).
Update autodoc extension versions, and mark the directives extension as not `parallel_read_safe` (it might be, but I'm not sure about how the xref lookups work if it is parallel so better to be safe).
Commands flagged as internal will display a warning, just like experimental commands.
Drop `passes/tests` group in favour of `internal` group, which is automatically assigned for any command without an assigned group which is flagged as internal.
Update experimental pass warnings to use a shared function. Reduces repetition, and also allows all of the warning flags to be combined (which at present is just experimental and the new internal).
Update `test_*` passes to call `internal()` in their constructors.
Content is now added to the `ContentListing` rather than the `PrettyHelp`.
`open_*` methods return the `ContentListing` that was added instead of leaving a hanging continuation.
This allows for (e.g.) options to be added directly to optiongroups, instead of requiring that groups be closed before continuation.
This also means that all `PrettyHelp`s are a listing, with the actual log being called by the default `Pass::help()`; making the mode field redundant.
Added `PrettyHelp::log_help()` which replaces the `PrettyHelp::Mode::LOG` logic.
Added `ContentListing::back()` which just returns the last element of the underlying content vector.
Some of the content tracking was made redundant and removed, in particular `PrettyHelp::_current_listing` and `ContentListing::parent`.
Converted `ContentListing` to a class instead of a struct, adjusting constructors to match.
Added `ContentListing` constructor that accepts a `source_location`.
Update `HelpPass::dump_cmds_json()` for new log_help.