Pack the elements of VlTriggerVec as dense bits (instead of a 1 byte
bool per bit), and check whether they are set on a word granularity.
This effectively transforms conditions of the form `if (trig.at(0) |
trig.at(2) | trig.at(64))` into `if (trig.word(0) & 0x5 | trig.word(1) &
0x1)`. This improves OpenTitan ST by about 1%, worth more on some other
designs.
Folding an AstLogAnd with a non-zero constant operand used to coerce the
type of the other operand, yielding an ill-typed node that DFG was then
unhappy about. Add a RedOr instead if the width of the replacement
operand is greater than zero.
Fixes#3726
Apart from the representational changes below, this patch renames
AstNodeMath to AstNodeExpr, and AstCMath to AstCExpr.
Now every expression (i.e.: those AstNodes that represent a [possibly
void] value, with value being interpreted in a very general sense) has
AstNodeExpr as a super class. This necessitates the introduction of an
AstStmtExpr, which represents an expression in statement position, e.g :
'foo();' would be represented as AstStmtExpr(AstCCall(foo)). In exchange
we can get rid of isStatement() in AstNodeStmt, which now really always
represent a statement
Peak memory consumption and verilation speed are not measurably changed.
Partial step towards #3420
- Rename `--dump-treei` option to `--dumpi-tree`, which itself is now a
special case of `--dumpi-<tag>` where tag can be a magic word, or a
filename
- Control dumping via static `dump*()` functions, analogous to `debug()`
- Make dumping independent of the value of `debug()` (so dumping always
works even without the debug flag)
- Add separate `--dumpi-graph` for dumping V3Graphs, which is again a
special case of `--dumpi-<tag>`
- Alias `--dump-<tag>` to `--dumpi-<tag> 3` as before
Introduce the @astgen directives parsed by astgen, currently used for
the generation child node (operand) accessors. Please see the updated
internal documentation for details.
Introduce the @astgen directives parsed by astgen, currently used for
the generation child node (operand) accessors. Please see the updated
internal documentation for details.
Adds timing support to Verilator. It makes it possible to use delays,
event controls within processes (not just at the start), wait
statements, and forks.
Building a design with those constructs requires a compiler that
supports C++20 coroutines (GCC 10, Clang 5).
The basic idea is to have processes and tasks with delays/event controls
implemented as C++20 coroutines. This allows us to suspend and resume
them at any time.
There are five main runtime classes responsible for managing suspended
coroutines:
* `VlCoroutineHandle`, a wrapper over C++20's `std::coroutine_handle`
with move semantics and automatic cleanup.
* `VlDelayScheduler`, for coroutines suspended by delays. It resumes
them at a proper simulation time.
* `VlTriggerScheduler`, for coroutines suspended by event controls. It
resumes them if its corresponding trigger was set.
* `VlForkSync`, used for syncing `fork..join` and `fork..join_any`
blocks.
* `VlCoroutine`, the return type of all verilated coroutines. It allows
for suspending a stack of coroutines (normally, C++ coroutines are
stackless).
There is a new visitor in `V3Timing.cpp` which:
* scales delays according to the timescale,
* simplifies intra-assignment timing controls and net delays into
regular timing controls and assignments,
* simplifies wait statements into loops with event controls,
* marks processes and tasks with timing controls in them as
suspendable,
* creates delay, trigger scheduler, and fork sync variables,
* transforms timing controls and fork joins into C++ awaits
There are new functions in `V3SchedTiming.cpp` (used by `V3Sched.cpp`)
that integrate static scheduling with timing. This involves providing
external domains for variables, so that the necessary combinational
logic gets triggered after coroutine resumption, as well as statements
that need to be injected into the design eval function to perform this
resumption at the correct time.
There is also a function that transforms forked processes into separate
functions.
See the comments in `verilated_timing.h`, `verilated_timing.cpp`,
`V3Timing.cpp`, and `V3SchedTiming.cpp`, as well as the internals
documentation for more details.
Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>
* Tests: Add a test to reproduce #3509
* Tests: Compile without tautological-compare check because bit op tree optimization is disabled in the test.
* Internals: Dedup code. No functional change is intended.
* Fix#3509.
"2'b10 == (2'b11 & {1'b0, val[0]})" and "2'b10 != (2'b11 & {1'b0, val[0]})" were
wrongly optimized to "!val[0]" and "val[0]" respectively.
Now properly optimize them to 1'b0 and 1'b1.
* Commentary
* Commentary: Update Changes
* Tests: Add a test to reproduce #3470
* Update LSB during return path of traversal. No functional change is intended.
* Introduce LeafInfo::m_msb
* Update LeafInfo::m_msb when visitin AstCCast
* Internals: Add comment, reorder. No functional change is intended.
* Delete explicit from copy constructor to fix build error.
* Update Changes
* Internals: Remove unused parameter. No functional change is intended.
* Tests: Add explanation to t_const_opt.
* Internals: Let LeafInfo class. No functional change is intended.
* Internals: Rename LeafInfo::width -> LeafInfo::varWidth(). No functional change is intende.
* Tests: Check BitOpTree statistics in t_const_opt.
* Tests: Add a test to reproduce #3445
* Fix#3445. Don't forget LSB of frozen node in BitOpTreeOpt.
* Apply suggestions from code review
Co-authored-by: Geza Lore <gezalore@gmail.com>
This is a major re-design of the way code is scheduled in Verilator,
with the goal of properly supporting the Active and NBA regions of the
SystemVerilog scheduling model, as defined in IEEE 1800-2017 chapter 4.
With this change, all internally generated clocks should simulate
correctly, and there should be no more need for the `clock_enable` and
`clocker` attributes for correctness in the absence of Verilator
generated library models (`--lib-create`).
Details of the new scheduling model and algorithm are provided in
docs/internals.rst.
Implements #3278
- Add more tests, including for tracing.
- Apply some cleaner, more generic abstractions in the implementation.
- Use clearer AstRelease which is not an assignment.
* Add a test to reproduce #3197
* Fix#3197. Optimize correctly even if a variable is >32
* Quick exit instead of continue. No functional change is intended.
* Delete comment-out line.
* update per review comment
* Add a test to reproduce bug3182. Run the same HDL with -Oo to confirm the result is same.
* Hopefully fix#3182. The result can be 0 only when polarity is true (no AstNot is found during traversal).
* add tests to reproduce #3177.
Any random test circuits can be added to t_split_var_4.v later because it uses CRC to check the result while
t_split_var_0.v has just barrel shifters.
* Fix#3177. Don't merge assign statements if a variable is marked split_var.
The _CONST suffix on these macros is only lexical notation, pointer
constness can be preserved by overloading the underlying
implementations appropriately. Given that the compiler will catch
invalid const usage (trying to assign a non-const pointer to a const
pointer variable, etc.), and that the declarations of symbols should
make their constness obvious, I see no reason to keep the _CONST
flavours.
Fail at compile time if the result of these macros can be statically
determined (i.e.: they aways succeed or always fail). Remove unnecessary
casts discovered. No functional change.
- Remove redundant casting
- Cheaper final XOR parity flip (~/^1 instead of != 0)
- Support XOR reduction of ~XOR nodes
- Don't add redundant masking of terms
- Support unmasked terms
- Add cheaper implementation for single bit only terms
- Ensure result is clean under all circumstances (this fixes current bugs)
* Add a test to reproduce #3023. Also applied verilog-mode formatting.
* use unique_ptr. No functional change is intended.
* Introduce restorer that reverts changes during iterate() if failed.
AND(CONST,SHIFTR(_,C)) appears often after V3Expand, with C a large
enough dense mask (i.e.: of the form (1 << n) - 1) to make the masking
redundant. E.g.: 0xff & ((uint32_t)a >> 24). V3Const now replaces these
ANDs with the SHIFTR node.
Similarly, we also simplify the same with SHIFTL,
e.g.: 0xff000000 & ((uint32_t)a << 24)
V3Expand generates a lot of OR nodes that are under a clearing mask, and
have redundant terms, e.g.: 0xff & (a << 8 | b >> 24). The 'a << 8' term
in there is redundant as it's bottom bits are all zero where the mask is
non-zero. V3Const now removes these redundant terms.
* Tests: Add more case that does not match native C++ width (8, 16, 32 or 64).
* Use AstVarRef::same() instead of AstNode::sameGateTree() because the latter checks dtype in addition to scope.
AstVarRef may have different minWidth in some cases,
but the difference should be ignored in the context of bitOpTree optimization.
* Tests: Add another testcase that triggers assertion failure in bitOpTree opt.
* Fix assertion failure in bitOpTree opt reported in #2891. Consider the follwoing case.
CCast -> WordSel -> VarRef(leaf)
* Make sure that m_bitPolarity is expanded enough.
The intention was to not merge impure assignments, but the actual
predicate failed if the assignment was indeed pure.
This fix gains 1.5% speed on SweRV EH1.
* Tests:Add some more signals to t_const_opt_red.v
* Optimize bit op trees such as ~a[2] & a[1] & ~a[0] to 3'b010 == (3'111 & a)
* Update src/V3Const.cpp
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
* Update src/V3Const.cpp
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
* Update src/V3Const.cpp
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
* Apply clang-format
* Don't edit and-or tree.
* Call matchBitOpTree() after V3Expand does its job.
* Internals: Rename newNodep -> newp. No functional change is intended.
* Internals: Remove m_sels. No functional change is intended.
* Internals: Remove stringstream. No functional change is intended.
* Internals: Use V3Number::bitIs1(). No functional change is intended.
* Internals: Use V3Number instead of std::map. Resolved laek. Result should be same.
* Internals: Resolve overload of setPolarity. No functional change is intended.
* Internals: Pass failure reason. No functional change is intended.
* Internals: Add VNUser::toPtr(). No functional change is intended.
* Internals: Use user4 instead of std::map. No functional change is intended.
* Catch up with the AST style aftre V3Expand
* Internals: Rename Context to VarInfo. No functional change is intended.
* Add some more test case
* tests:Add stats to tests
Update stats in t_merge_cond.pl as matchBitopTree does some of them.
* insert CCast if necessary
* small optimization to remove redundant bit mask
* No quick exit even when unoptimizable node is found.
* Simplify removing redundant And
* simplify
* Revert "Internals: Add VNUser::toPtr(). No functional change is intended."
This reverts commit f98dce10db.
* Consider AstWordSel and cleanup
* Update test
* Update src/V3Const.cpp
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
* Update src/V3Const.cpp
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
* Update src/V3Const.cpp
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
* Update src/V3Const.cpp
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
* Update src/V3Const.cpp
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
* Update src/V3Const.cpp
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
* Apply clang-format
* Internals: rename variables. No functional change is intended.
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
Co-authored-by: github action <action@example.com>