Doing ABC runs in parallel can actually make things slower when every ABC run requires
spawning an ABC subprocess --- especially when using popen(), which on glibc does not
use vfork(). What seems to happen is that constant fork()ing keeps making the main
process data pages copy-on-write, so the main process code that is setting up each ABC
call takes a lot of minor page-faults, slowing it down.
The solution is pretty straightforward although a little tricky to implement.
We just reuse ABC subprocesses. Instead of passing the ABC script name on the command
line, we spawn an ABC REPL and pipe a command into it to source the script. When that's
done we echo an `ABC_DONE` token instead of exiting. Yosys then puts the ABC process
onto a stack which we can pull from the next time we do an ABC run.
For one of our large designs, this is an additional 5x speedup of the primary AbcPass.
It does 5155 ABC runs, all very small; runtime of the AbcPass goes from 760s to 149s
(not very scientific benchmarking but the effect size is large).
Large circuits can run hundreds or thousands of ABCs in a single AbcPass.
For some circuits, some of those ABC runs can run for hundreds of seconds.
Running ABCs in parallel with each other and in parallel with main-thread
processing (reading and writing BLIF files, copying ABC BLIF output into
the design) can give large speedups.
When FILTERLIB is defined (attempts to compile libparse more or less standalone,) mark the `LibertyParser::error()` as weak so utilities using libparse as a library can override its behavior (the default behavior being exit(1)). As the code is quite performance-critical, I've elected to not modify it to raise an exception or have a callback or similar and simply allow for a link-time replacement.
Currently `assign_map` is rebuilt from the module from scratch every time we invoke ABC.
That doesn't scale when we do thousands of ABC runs over large modules. Instead,
create it once and then maintain incrementally it as we update the module.
- Make IdString parameter pass by const ref to avoid IdString ref counting in the constructor
- Remove extra std::string allocation to remove prefix
- Avoid using `stringf` for concatenation
Partially reverts commit 9c5bffcf93.
The reasoning behind this is that setup.py is intended to strictly consume the Makefile and not be consumed by it. The attempt at using them recursively has caused a number of issues and has rendered Pyosys unusable to some users: See https://github.com/YosysHQ/yosys/issues/5012
Additionally, unlike the previous pyosys installation target, the wheel installation does not respect PREFIX=, only venvs.
For installation inside a venv, the intended method should remain a user manually executing `pip3 install .` instead of relying on the Makefile.
This adds optional in-memory caching of parsed liberty files to speed up
flows that repeatedly parse the same liberty files. To avoid increasing
the memory overhead by default, the caching is disabled by default. The
caching can be controlled globally or on a per path basis using the new
`libcache` command, which also allows purging cached data.
These were introduced by 0a6d9f4.
1) While in a paren "(", don't error on newline.
2) Don't parse an extra token when parsing vector ranges. Let the caller parse the next token as necessary.
This extends the `LibertyInputStream` added in the previous commit to
allow arbitrary lookahead. Then this uses the lookahead to find the
total length of the token within the input buffer, instead of consuming
the token byte by byte while appending to a std::string. Constructing
the std::string with the total length is known avoids any reallocations
from growing std::string's buffer.
The lexer for liberty files was using istream's `get` and `unget` which
are notorious for bad performance and that showed up during profiling.
This replaces the direct `istream` use with a custom LibertyInputStream
that does its own buffering to provide `get` and `unget` that behave the
same way but are implemented with a fast path that is easy to inline and
optimize.
The B port is for single-bit summands. These can just as well be
represented as an additional summand on the A port (which supports
summands of arbitrary width). An upcoming `$macc_v2` cell won't be
special-casing single-bit summands in any way.
In preparation, make the following changes:
* remove the `bit_ports` field from the `Macc` helper (instead add any
single-bit summands to `ports` next to other summands)
* leave `B` empty on cells emitted from `Macc::to_cell`
Previously the `abc9_box` mode was reserved to modules with the
`blackbox` or `whitebox` attribute. Allow `abc9_box` on ordinary modules
when doing hierarchical synthesis.
`abc9_ops -prep_box` command interprets the `abc9_box` attribute and
prepares a .box file for ABC consumption. Previously this command was
removing the attribute as it was processing each module which prevented
repeated invocation of this command unless the box definitions were
refreshed from a source file.
Also the command was keeping existing `abc9_box_id` attributes instead
of overwriting them with values from a new number sequence.
Change both behaviors to allow repeated invocations of the command on
the same design.