verilator

Commit Graph

Author	SHA1	Message	Date
Wilson Snyder	880cac2fdd	Merge branch 'master' into develop-v5	2022-10-01 11:24:55 -04:00
github action	a204b24fcf	Apply 'make format'	2022-10-01 15:06:12 +00:00
Marcel Chang	526e6b9fc7	Add --dump-tree-dot to enable dumping Ast Tree .dot files (#3636 )	2022-10-01 11:05:33 -04:00
github action	f1ba6cb517	Apply 'make format'	2022-10-01 14:53:40 +00:00
Kanad Kanhere	159cf0429c	Support linting for top module interfaces (#3635 )	2022-10-01 10:48:37 -04:00
Ryszard Rozak	46b8dca360	Add handling of tristate select/extend (#3604 )	2022-10-01 10:34:30 -04:00
Geza Lore	cc51966ad1	DFG: Remove unconneced variables early	2022-09-30 11:53:03 +01:00
Geza Lore	c9d6344f2f	DFG: Extract cyclic components separately A lot of optimizations in DFG assume a DAG, but the more things are representable, the more likely it is that a small cyclic sub-graph is present in an otherwise very large graph that is mostly acyclic. In order to avoid loosing optimization opportunities, we explicitly extract the cyclic sub-graphs (which are the strongly connected components + anything feeing them, up to variable boundaries) and treat them separately. This enables optimization of the remaining input.	2022-09-30 09:51:10 +01:00
Geza Lore	acebafcbc2	DFG: Partial support for unpacked arrays Representation and Ast / Dfg conversions available, for element-wise access only. Not much optimization yet (only CSE).	2022-09-29 19:00:45 +01:00
Geza Lore	4a1a2def95	DFG: make variable inlining part of the peephole optimizer This saves some traversals and prepares us to better handle cyclic DFGs.	2022-09-29 18:40:10 +01:00
Geza Lore	09e352ef66	DFG: support hashing of graphs circular through variables No functional change	2022-09-29 18:40:10 +01:00
Geza Lore	17976d7401	DFG: fix REPLACE_EQ_OF_CONST_AND_CONST peephole pattern	2022-09-29 18:40:10 +01:00
Wilson Snyder	cd2a5771b8	Add --timing to --binary (#3625 ).	2022-09-28 19:02:23 -04:00
Krzysztof Bieganski	9c2ead90d5	Add custom memory management for verilated classes (#3595 ) This change introduces a custom reference-counting pointer class that allows creating such pointers from 'this'. This lets us keep the receiver object around even if all references to it outside of a class method no longer exist. Useful for coroutine methods, which may outlive all external references to the object. The deletion of objects is deferred until the next time slot. This is to make clearing the triggered flag on named events in classes safe (otherwise freed memory could be accessed).	2022-09-28 18:54:18 -04:00
Geza Lore	a999c73ce0	Commentary	2022-09-28 14:43:40 +01:00
Wilson Snyder	b92173bf3d	Add --binary option as alias of --main --exe --build (#3625 ).	2022-09-28 09:04:33 -04:00
Wilson Snyder	c6bce636ee	Merge branch 'master' into develop-v5	2022-09-27 22:19:04 -04:00
Wilson Snyder	75a70bee6d	Update to clang-format-14 on Ubuntu22.04	2022-09-27 21:47:45 -04:00
Ryszard Rozak	4931e48016	Support resolving assignments with equal strengths (#3637 )	2022-09-26 21:21:37 -04:00
Geza Lore	1b17acdb01	DFG: Support AstSel and AstConcat on LHS of assignments Added DfgVertexVariadic to represent DFG vetices with a varying number of source operands. Converted DfgVar to be a variadic vertex, with each driver corresponding to a fixed range of bits in the packed variable. This allows us to handle AstSel on the LHS of assignments. Also added support for AstConcat on the LHS by selecting into the RHS as appropriate. This improves OpenTitan ST speed by ~13%	2022-09-26 19:54:52 +01:00
Geza Lore	9c1cc5465d	DFG: Support packed structure and union types	2022-09-26 18:31:50 +01:00
Geza Lore	d8b5359fcb	Merge branch 'master' into develop-v5	2022-09-26 14:45:08 +01:00
Geza Lore	9da012568c	Ensure DFG stats are consistent	2022-09-26 14:38:26 +01:00
Geza Lore	9a20a258f5	Omit AstNode::m_editCount in release build This is only a debugging aid at this point, so compile out of the release build. This reduces peak memory consumption by 4-5%. We still keep the global counters to detect the tree have changed, to avoid unnecessary dumps.	2022-09-25 08:57:33 +01:00
Geza Lore	10796457d2	V3Life: don't depend on AstNode::editCountGbl() No functional change intended.	2022-09-24 20:45:30 +01:00
Geza Lore	78e659a142	Reduce size of FileLine Multiple tricks to reduce the size of class FileLine from 72 to 40 bytes: - Reduce file name index from 32 to 16 bits. This still allows 64K unique input files, which is hopefully enough. - Intern message/warning enable bitset and use a 16-bit index, again allowing 64K unique sets which is hopefully enough. - Put the m_waive flag into the sign bit of one of the line numbers. - Use explicit reference counting to avoid overhead of shared_ptr. Added assertions to ensure interned data fits within it's index space. This saves ~5-10% peak memory consumption at no measurable run-time cost on various designs.	2022-09-24 20:16:21 +01:00
Geza Lore	47bce4157d	Introduce DFG based combinational logic optimizer (#3527 ) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.	2022-09-23 16:46:22 +01:00
Geza Lore	3a8a314566	Merge branch 'master' into develop-v5	2022-09-23 11:21:12 +01:00
Geza Lore	050060b139	Make enum constructors and operators constexpr	2022-09-23 11:10:28 +01:00
Geza Lore	ddb678cc5b	Merge branch 'master' into develop-v5	2022-09-22 17:33:36 +01:00
Geza Lore	63c694f65f	Streamline dump control options - Rename `--dump-treei` option to `--dumpi-tree`, which itself is now a special case of `--dumpi-<tag>` where tag can be a magic word, or a filename - Control dumping via static `dump*()` functions, analogous to `debug()` - Make dumping independent of the value of `debug()` (so dumping always works even without the debug flag) - Add separate `--dumpi-graph` for dumping V3Graphs, which is again a special case of `--dumpi-<tag>` - Alias `--dump-<tag>` to `--dumpi-<tag> 3` as before	2022-09-22 17:24:41 +01:00
github action	12093e6939	Apply 'make format'	2022-09-21 19:22:15 +00:00
Geza Lore	9949a6cd17	Generate AstGen::checkTreeiter to enforce Ast op*p use Use astgen to generate a more thorough version of AstNode::checkTree, which checks that operands are or consistent structure and type, as described in the @astgen op directives. Also change checkTree to always run when --debug-check is given. Fix discovered fallout.	2022-09-21 18:12:11 +01:00
Geza Lore	4600932d8c	Remove unused files	2022-09-21 14:16:20 +01:00
Geza Lore	95145038b4	Generate AstNode accessors via astgen Introduce the @astgen directives parsed by astgen, currently used for the generation child node (operand) accessors. Please see the updated internal documentation for details.	2022-09-21 14:05:27 +01:00
Geza Lore	ce03293128	Generate AstNode accessors via astgen Introduce the @astgen directives parsed by astgen, currently used for the generation child node (operand) accessors. Please see the updated internal documentation for details.	2022-09-21 13:56:03 +01:00
Geza Lore	72e7271a14	Merge branch 'master' into develop-v5	2022-09-21 12:19:00 +01:00
Kamil Rakoczy	0b07679ff2	v3errorEnd: look for instance only when warning is not ignored (#3632 ) This approach reduced total time of V3Undriven stage from 34,2s to 2,5s in design containing almost 400 000 unused variables. Signed-off-by: Kamil Rakoczy <krakoczy@antmicro.com>	2022-09-21 10:54:23 +01:00
Wilson Snyder	d162619bd3	Merge branch 'master' into develop-v5	2022-09-20 20:06:21 -04:00
Wilson Snyder	5df14627fd	Fix 32-bit build of previous commit	2022-09-20 18:23:44 -04:00
Mariusz Glebocki	fc3ce29845	Improve Verilation memory by reducing V3Number size (#3521 )	2022-09-20 16:46:47 -04:00
Yu-Sheng Lin	bba800f2d6	Fix calling trace() after open() segfault (#3610 ) (#3627 )	2022-09-20 16:45:09 -04:00
Ryszard Rozak	fe2a1e1749	Remove assignments with strengths weaker than strongest non-tristate RHS (#3629 )	2022-09-19 04:54:20 -04:00
Wilson Snyder	fc4ffd454e	Rename --bin to --build-dep-bin.	2022-09-18 10:32:43 -04:00
Geza Lore	7bc7b5372e	Merge branch 'master' into develop-v5	2022-09-17 16:12:28 +01:00
Geza Lore	7d88e63bab	astgen: generate type specific addNext, remove astNextNull Generate type specific static overloads of Ast<Node>::addNext, which return the correct sub-type of the 'this' they were invoked on. Also remove AstNode::addNextNull, which is now only used in the parser, implement in verilog.y directly as a template function.	2022-09-17 15:05:22 +01:00
Wilson Snyder	a214fd1f78	Internals: Fix constructor syntax in new develop-v5 code	2022-09-17 08:56:41 -04:00
Wilson Snyder	79be097e34	Sort -V env variable output	2022-09-17 08:17:55 -04:00
Wilson Snyder	11b0d36ba2	Merge cleanups from 'develop-v5'. No functional change	2022-09-17 08:17:22 -04:00
Geza Lore	af305bf280	Merge branch 'master' into develop-v5	2022-09-16 16:24:36 +01:00
Geza Lore	38a8d7fb2e	Remove redundant 'inline' keywords from definitions Also add checks to t/t_dist_cppstyle	2022-09-16 15:52:25 +01:00
Geza Lore	0c70a0dcbf	Remove redundant 'virtual' keywords from overridden methods 'virtual' is redundant when 'override' is present, so keep only 'override'. Add t/t_dist_cppstyle.pl to check for this.	2022-09-16 15:19:38 +01:00
Geza Lore	d16619fe86	astgen: Explicitly generate AstNode members Generate boilerplate members of AstNode sub-types directly via astgen. This is in preparation for generating additional members.	2022-09-16 11:18:20 +01:00
Wilson Snyder	2dc85a5acd	Internals: enum constructor cleanups. No functional change intended.	2022-09-15 19:58:10 -04:00
Kamil Rakoczy	dbe1348b4c	Tests: Fix earlier commit, add build jobs to stats (#3623 ) (#3626 )	2022-09-15 11:29:50 -04:00
Geza Lore	22846df03e	Merge branch 'master' into develop-v5	2022-09-15 14:01:19 +01:00
Wilson Snyder	d74536a4dc	Internals: Cleanup some constructors. No functional change intended.	2022-09-15 08:54:04 -04:00
Kamil Rakoczy	da20da264b	Add --build-jobs, and rework arguments for -j (#3623 )	2022-09-15 08:28:58 -04:00
Geza Lore	22b9dfb9c9	Split and re-order AstNode definitions (#3622 ) - Move DType representations into V3AstNodeDType.h - Move AstNodeMath and subclasses into V3AstNodeMath.h - Move any other AstNode subtypes into V3AstNodeOther.h - Fix up out-of-order definitions via inline methods and implementations in V3Inlines.h and V3AstNodes.cpp - Enforce declaration order of AstNode subtypes via astgen, which will now fail when definitions are mis-ordered.	2022-09-15 13:10:39 +01:00
Geza Lore	27031ed688	Merge branch 'master' into develop-v5	2022-09-15 10:28:35 +01:00
Wilson Snyder	d85b909054	Internals: Use std:: for mem and str functions.	2022-09-14 21:10:19 -04:00
Wilson Snyder	75fd71d7e5	Add --main to generate main() C++ (previously was experimental only) (#3265 ).	2022-09-14 20:18:40 -04:00
Ryszard Rozak	a3c58d7b70	Support IEEE constant signal strengths (#3601 ).	2022-09-14 07:39:27 -04:00
Kamil Rakoczy	ae466b1703	Internals: Improve Verilation peak memory usage in V3Subst (#3512 ).	2022-09-14 07:37:51 -04:00
Geza Lore	2564484429	astgen: Rewrite in a more OOP way, in preparation for extensions Rely less on strings and represent AstNode classes as a 'class Node', with all associated properties kept together, rather than distributed over multiple dictionaries or constructed at retrieval time. No functional change intended.	2022-09-13 21:54:12 +01:00
Kamil Rakoczy	93a044f587	Internals: Rework addFilesp towards parallel emit (#3620 ). No functional change intended.	2022-09-13 12:15:34 -04:00
Wilson Snyder	81fe35ee2e	Fix typedef'ed class conversion to boolean (#3616 ).	2022-09-12 18:03:56 -04:00
Geza Lore	08b6bdddf9	Update default --mod-prefix when --prefix is repeated Fixes #3603	2022-09-12 17:25:09 +01:00
Kamil Rakoczy	4d49db48a3	Internals: Remove usage of user1 from EmitCTrace (#3617 ). No Functional change intended.	2022-09-12 12:00:41 -04:00
Kamil Rakoczy	9b2266f68c	Internals: Remove usage of global state in V3EmitCFunc (#3615 ). No functional change intended.	2022-09-12 11:59:14 -04:00
Wilson Snyder	752f425025	Tests: Process/Semaphore/Mailbox testing (all fail until supported)	2022-09-11 13:05:24 -04:00
Gustav Svensk	47262cd4ec	Fix arguments in non-static method call (#3547 ) (#3582 )	2022-09-11 12:33:31 -04:00
Wilson Snyder	47e64535d6	Commentary	2022-09-11 12:25:44 -04:00
Geza Lore	90ab746a42	Make it possible to parallelize ico and act scheduling sections Small fixup patch so the 'ico' and 'act' scheduling sections could be ordered as multi-threaded. However, we still only order these single threaded at the moment (but switching them to multi-threaded now works).	2022-09-06 16:01:13 +01:00
Geza Lore	fd6275a62b	Merge branch 'master' into develop-v5	2022-09-05 17:03:43 +01:00
Krzysztof Bieganski	6b6790fc50	Preserve return type of `AstNode::addNext` via templating (#3597 ) No functional change intended. Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>	2022-09-05 16:56:57 +01:00
Krzysztof Bieganski	fb931087ab	Add stats tracking for `V3Undriven`. (#3600 ) Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>	2022-09-05 16:20:38 +01:00
Krzysztof Bieganski	a2e1b32a1c	Fix inlining of forks (#3594 ) Before this change, some forked processes were being inlined in `V3Timing` because they contained no `CAwait`s. This only works under the assumption that no `CAwait`s will be added there later, which is not true, as a function called by a forked process could be turned into a coroutine later. The call would be wrapped in a new `CAwait`, but the process itself would have already been inlined at this point. This commit moves the inlining to `transformForks` in `V3SchedTiming`, which is called at a point when all `CAwait`s are already in place. Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>	2022-09-05 15:19:19 +01:00
Krzysztof Bieganski	54f89bce42	Move `SenExprBuilder` to a header. (#3598 ) No functional change intended. Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>	2022-09-05 15:17:51 +01:00
Krzysztof Bieganski	8b19d02e3b	Fix `co_await VlNow{}` being added too many times (#3596 ) (or not at all) Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>	2022-09-05 11:46:34 +01:00
Geza Lore	937e893b6d	Build verilator_bin with -O3 (#3592 ) This is consistently a few percent faster.	2022-09-03 22:10:07 +01:00
Geza Lore	d42a2d6494	Fix V3Gate crash on circular logic The recent patch to defer substitutions on V3Gate crashes on circular logic that has cycle length >= 3 with all inlineable signals (cycle length 2 is detected correctly and is not inlined). Fix by stopping recursion at the loop-back edge. Fixes #3543	2022-09-02 19:58:58 +01:00
Geza Lore	8e8f4b1e5c	Remove AstVarScope::valuep() and related code This is detritus from when V3TraceDecl used to run after V3Gate, today V3TraceDecl runs before V3Gate and this value has no function at all. No functional change intended.	2022-09-02 16:44:13 +01:00
Geza Lore	298f71f2b1	Merge branch 'master' into develop-v5	2022-09-02 12:19:35 +01:00
Geza Lore	2ba39b25f1	Replace dynamic_casts with static_casts dynamic_cast is not free. Replace obvious instances (where the result is unconditionally dereferenced) with static_cast in contexts with performance implications.	2022-09-02 12:08:34 +01:00
Geza Lore	5c828b7e60	V3Partition: use V3Lists to keep track of SiblingMCs Replace std::set<SiblingMC> with V3Lists to keep track of SiblingMCs associated with MTasks, use a std::set<LogicMTask*> for ensuring uniqueness. This yields a bit more speed in PartContraction.	2022-09-01 19:40:44 +01:00
Geza Lore	4640bea31a	V3Partition: More improvements for PartFixDataHazards - Remove redundant loop through the MTask graph - Gather variables directly from the OrderGraph, which is simpler and faster.	2022-09-01 16:30:04 +01:00
Geza Lore	875361d7ce	V3Partition: Reduce working set size of PartContraction (#3587 ) This yields an additional 25% speedup of MT scheduling.	2022-09-01 16:29:40 +01:00
Wilson Snyder	849bb5590a	Merge branch 'master' into develop-v5	2022-08-31 19:51:07 -04:00
Wilson Snyder	51daa64e9a	Fix --hierarchical with order-based pin connections (#3585 ).	2022-08-31 18:12:21 -04:00
Geza Lore	c0f9b0d8f6	V3Partition: Refactor initialization of MTask dependencies No functional change	2022-08-31 16:54:04 +01:00
Geza Lore	505bba14eb	Improve PartFixDataHazards for clarity and speed. - Use modern C++ - Implement OrderLogicVertex->LogicMTask map with OrderLogicVertex::userp(), insteas of std::unordered_map - Simplify data structures - Simplify code and assert properties No functional change.	2022-08-31 16:52:05 +01:00
Geza Lore	ebbe24966c	Remove unnecessary virtual methods	2022-08-31 16:52:05 +01:00
Geza Lore	881c3f6e40	Minor optimization of PartContraction Remove rarely used debug code from initialization loop.	2022-08-31 16:52:05 +01:00
Geza Lore	546aeab9f2	V3Order: Minor refactoring for clarity Refactor ProcessMoveBuildGraph utilizing the fact that OrderGraph is a bipartite graph, also remove unnecessary unordered_map and distribute variable domain map. No functional change.	2022-08-31 16:52:05 +01:00
Geza Lore	8de21e9bb7	Document and ensure OrderGraph is bipartite Minor refactoring and documentation. No functional change.	2022-08-31 16:52:05 +01:00
Geza Lore	2ecda74471	Merge branch 'master' into develop-v5	2022-08-31 10:45:18 +01:00
Aleksander Kiryk	2136afde6b	Support negated properties (#3572 )	2022-08-30 06:33:42 -04:00
Wilson Snyder	ea55db7286	Internals: Cleanup some string constructors. No functional change.	2022-08-30 01:02:39 -04:00
Wilson Snyder	819e8741cc	Merge branch 'master' into develop-v5	2022-08-30 00:20:21 -04:00
Wilson Snyder	6a5f77b278	Internals: Cleanup some string/model constructors. No functional change.	2022-08-29 23:50:32 -04:00
Wilson Snyder	8658a0d7dc	Internals: Constructor format update. No functional change.	2022-08-29 23:05:52 -04:00
Wilson Snyder	c335aad25f	Fix --hierarchical with order-based pin connections (#3583 ).	2022-08-29 22:49:19 -04:00
Wilson Snyder	9d9d647c1f	Fix indentation of --protect import function SV code.	2022-08-29 22:28:02 -04:00
Wilson Snyder	d47a37fb76	Internals: Cleanup constructors etc. No functional change.	2022-08-29 22:17:27 -04:00
Aleksander Kiryk	24ec84851a	Support $sampled (#3569 )	2022-08-29 08:39:41 -04:00
Arkadiusz Kozdra	0a3a15a66e	Support class parameters (#2231 ) (#3541 )	2022-08-28 10:24:55 -04:00
Krzysztof Bieganski	2af5304884	Fix tracing of slow coroutines (#3576 part) (#3579 )	2022-08-26 05:11:44 -05:00
Varun Koyyalagunta	5869fdf7f6	Fix $dump systemtask with --output-split-cfuncs (#3495 ) (#3497 )	2022-08-25 18:29:11 -05:00
Krzysztof Bieganski	1a1d2ecfd9	Enable tracing in generated main (#3578 ) Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>	2022-08-25 14:55:37 +01:00
Geza Lore	5c356a4680	Merge branch 'master' into develop-v5	2022-08-22 14:32:06 +01:00
Krzysztof Bieganski	39af5d020e	Timing support (#3363 ) Adds timing support to Verilator. It makes it possible to use delays, event controls within processes (not just at the start), wait statements, and forks. Building a design with those constructs requires a compiler that supports C++20 coroutines (GCC 10, Clang 5). The basic idea is to have processes and tasks with delays/event controls implemented as C++20 coroutines. This allows us to suspend and resume them at any time. There are five main runtime classes responsible for managing suspended coroutines: * `VlCoroutineHandle`, a wrapper over C++20's `std::coroutine_handle` with move semantics and automatic cleanup. * `VlDelayScheduler`, for coroutines suspended by delays. It resumes them at a proper simulation time. * `VlTriggerScheduler`, for coroutines suspended by event controls. It resumes them if its corresponding trigger was set. * `VlForkSync`, used for syncing `fork..join` and `fork..join_any` blocks. * `VlCoroutine`, the return type of all verilated coroutines. It allows for suspending a stack of coroutines (normally, C++ coroutines are stackless). There is a new visitor in `V3Timing.cpp` which: * scales delays according to the timescale, * simplifies intra-assignment timing controls and net delays into regular timing controls and assignments, * simplifies wait statements into loops with event controls, * marks processes and tasks with timing controls in them as suspendable, * creates delay, trigger scheduler, and fork sync variables, * transforms timing controls and fork joins into C++ awaits There are new functions in `V3SchedTiming.cpp` (used by `V3Sched.cpp`) that integrate static scheduling with timing. This involves providing external domains for variables, so that the necessary combinational logic gets triggered after coroutine resumption, as well as statements that need to be injected into the design eval function to perform this resumption at the correct time. There is also a function that transforms forked processes into separate functions. See the comments in `verilated_timing.h`, `verilated_timing.cpp`, `V3Timing.cpp`, and `V3SchedTiming.cpp`, as well as the internals documentation for more details. Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>	2022-08-22 13:26:32 +01:00
Geza Lore	9ac64d0b92	Improve performance of MTask coarsening Various optimizations to speed up MTasks coarsening (which is the long pole in the multi-threaded scheduling of very large designs). The biggest impact ones: - Use efficient hand written Pairing Heaps for implementing priority queues and the scoreboard, instead of the old SortByValueMap. This helps us avoid having to sort a lot of merge candidates that we will never actually consider and helps a lot in performance. - Remove unnecessary associative containers and store data structures (the heap nodes in particular) directly in the object they relate to. This eliminates a huge amount of lookups and helps a lot in performance. - Distribute storage for SiblingMC instances into the LogicMTask instances, and combine with the sibling maps. This again eliminates hash table lookups and makes storage structures smaller. - Remove some now bidirectional edge maps, keep only the forward map. There are also some other smaller optimizations: - Replaced more unnecessary dynamic_casts with static_casts - Templated some functions/classes to reduce the number of static branches in loops. - Improves sorting of edges for sibling candidate creation - Various micro-optimizations here and there This speeds up MTask coarsening by 3.8x on a large design, which translates to a 2.5x speedup of the ordering pass in multi-threaded mode. (Combined with the earlier optimizations, ordering is now 3x faster.) Due to the elimination of a lot of the auxiliary data structures, and ensuring a minimal size for the necessary ones, memory consumption of the MTask coarsening is also reduced (measured up to 4.4x reduction though the accuracy of this is low). The algorithm is identical except for minor alterations of the order some candidates are added or removed, this can cause perturbation in the output due to tied scores being broken based on IDs.	2022-08-20 21:18:50 +01:00
Wilson Snyder	7cc89b8b42	Merge branch 'master' into develop-v5	2022-08-20 14:19:45 -04:00
Wilson Snyder	c6607724cb	Fix clang warning.	2022-08-20 14:19:00 -04:00
Wilson Snyder	ebb37b0156	Merge branch 'master' into develop-v5	2022-08-20 14:02:09 -04:00
Wilson Snyder	90dc04cf93	Add --future0 and --future1 options.	2022-08-20 14:01:13 -04:00
Krzysztof Bieganski	10cf492946	Add support for expressions in event controls (#3550 ) Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>	2022-08-19 20:18:38 +02:00
Geza Lore	4d81eb021d	Revert "Improve performance of MTask coarsening" This reverts commit `83475008d9`.	2022-08-19 18:03:45 +01:00
Geza Lore	83475008d9	Improve performance of MTask coarsening Various optimizations to speed up MTasks coarsening (which is the long pole in the multi-threaded scheduling of very large designs). The biggest impact ones: - Use efficient hand written Pairing Heaps for implementing priority queues and the scoreboard, instead of the old SortByValueMap. This helps us avoid having to sort a lot of merge candidates that we will never actually consider and helps a lot in performance. - Remove unnecessary associative containers and store data structures (the heap nodes in particular) directly in the object they relate to. This eliminates a huge amount of lookups and helps a lot in performance. - Distribute storage for SiblingMC instances into the LogicMTask instances, and combine with the sibling maps. This again eliminates hash table lookups and makes storage structures smaller. - Remove some now bidirectional edge maps, keep only the forward map. There are also some other smaller optimizations: - Replaced more unnecessary dynamic_casts with static_casts - Templated some functions/classes to reduce the number of static branches in loops. - Improves sorting of edges for sibling candidate creation - Various micro-optimizations here and there This speeds up MTask coarsening by 3.8x on a large design, which translates to a 2.5x speedup of the ordering pass in multi-threaded mode. (Combined with the earlier optimizations, ordering is now 3x faster.) Due to the elimination of a lot of the auxiliary data structures, and ensuring a minimal size for the necessary ones, memory consumption of the MTask coarsening is also reduced (measured up to 4.4x reduction though the accuracy of this is low). The algorithm is identical except for minor alterations of the order some candidates are added or removed, this can cause perturbation in the output due to tied scores being broken based on IDs.	2022-08-19 16:59:20 +01:00
Geza Lore	03ac7ad730	Make PartPropagateCp specific to the MTask graph While keeping the client code abstract in PartPropagateCp is nice for testing, there is performance to be had removing the abstraction. As this code dominates in scheduling large designs, we eliminate the abstraction and re-work the testing to use the actual LogicMTask and MTaskEdge graph types. No functional change intended.	2022-08-19 14:06:11 +01:00
Geza Lore	cd50949a7e	Reuse MTaskEdge instances in MT scheduling Instead of deleting then re-allocating MTaskEdge instances when merging two MTasks, just redirect the edged of the donor MTask to the recipient MTask. This is both faster as it avoids an allocation and a deletion, together with one update of the sibling maps, and also makes the algorithm more stable due to MergeCandidate IDs being stable and allocated up front for all MTaskEdges, before any SiblingMCs are allocated. Perturbations in output are expected as the IDs used to break ties between merge candidates with equal costs are not updated when redirecting an edge (on purpose). The relinking of only one end of the graph edges also perturbs the order in which they are enumerated, which does change candidate opportunities when the number of edges is larger than PART_SIBLING_EDGE_LIMIT. Confirmed output is identical when IDs are updated and edges are updated to appear in their original order.	2022-08-19 14:06:11 +01:00
Geza Lore	f0040c7b9a	Remove reliance on pointer comparison in MT scheduling The critical path propagation used to rely on a pointer comparison to break equal scoring critical path updates. Use the corresponding mtask ids instead, which is deterministic across invocations.	2022-08-19 14:06:11 +01:00
Geza Lore	f8a0389e73	Do not use stepCost when gathering sibling merge candidates siblingPairFromRelatives gathers neighbours of a vertex, and sorts them. It then takes the N best nodes, and creates sibling merge candidates from them. We now use the unadjusted cost instead of the step cost of the vertices when sorting. This is both faster as we need not do the log-space rounding to compute stepCost, and will also make similar but yet cheaper nodes appear closer to the front as we don't lose precision in rounding, hence they are more likely to be entered as merge candidates. Note that when creating the merge candidate, we still use the stepCost, so it's purpose of reducing the propagation of critical path updates is maintained in full. In summary, this should make both Verilator and the generated model very slightly faster, at least in theory, and I have observed minor improvement in places.	2022-08-19 14:06:11 +01:00
Geza Lore	b436794773	Add specialized GraphStreamUnordered GraphStreamUnordered used to be GraphStream<std::less<const V3GraphVertex*>>, but a lot of performance improvements can be had by a specialized implementation, so added a highly optimized one. This helps a lot with --debug-partition.	2022-08-19 14:06:11 +01:00
Geza Lore	1404319b28	Merge branch 'master' into develop-v5	2022-08-19 13:39:44 +01:00
Geza Lore	90d22cbec6	Fix `AstNode::exists` return type	2022-08-19 13:22:06 +01:00
Krzysztof Bieganski	33e2acfe61	Fix `AstNode::forall` return type (#3559 ) Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>	2022-08-19 12:33:17 +01:00
Ryszard Rozak	db5fdfb0ee	Fix === with some tristate constants (#3551 ).	2022-08-18 07:03:05 -04:00
Krzysztof Bieganski	951cd73fe0	Handle MemberSel in V3EmitV.cpp (#3555 )	2022-08-18 06:33:45 -04:00
Arkadiusz Kozdra	0eeb40b975	Fix converting subclasses to string (#3552 )	2022-08-17 18:08:43 -04:00
Wilson Snyder	f435d96241	Fix case statement comparing string literal (#3544 ).	2022-08-15 21:56:09 -04:00
github action	d32e3f042f	Apply 'make format'	2022-08-12 10:56:12 +00:00
Mostafa Gamal	df5f95a5bd	Fix nested default assignment for struct pattern (#3511 ) (#3524 )	2022-08-12 06:55:07 -04:00
Drew Ranck	b0c475205b	Fix void-cast queue pop_front or pop_back (#3542 ) (#3364 ) Fix compile error for queue method usage, if it is the first statement in a block of code, and the return value is not used. Example: > if (foo) > void'(bar.pop_front());	2022-08-12 06:51:25 -04:00
Wilson Snyder	cbe1b8e266	Fix segfault exporting non-existant package (#3535 ).	2022-08-08 17:53:50 -04:00
Mariusz Glebocki	2b12fe5773	Internals: Construct V3Number with correct type instead of changing it manually. (#3529 )	2022-08-08 08:17:02 -04:00
Yutetsu TAKATSUKASA	d20f22beb1	Fix tristate logic when reading inout port in a module #3399 (#3523 ) * Tests: Add a test to reproduce #3399 * Fix #3399. When reading an inout port in a module, it should refer the original inout port, not the generated MODTEMP.	2022-08-07 21:12:57 +09:00
Mariusz Glebocki	122e89ffde	Fix V3Number::isMsbXZ(). (#3530 )	2022-08-05 19:12:52 +01:00
Geza Lore	c266739e9f	Merge branch 'master' into develop-v5	2022-08-05 12:17:57 +01:00
Geza Lore	96a4b3e5a5	Update clang-format config and apply - Regroup and sort #include directives (like we used to, but automatic) - Set AlwaysBreakTemplateDeclarations to true	2022-08-05 12:00:24 +01:00
Geza Lore	7403226a97	Merge branch 'master' into develop-v5	2022-08-04 10:03:38 +01:00
Geza Lore	fac8e76923	Rework SortByValueMap for better performance Keep a single std::set of key/value pairs, and a single unordered_map from key to iterators into the set. Also improve some of the accessing mechanisms using modern C++. This speeds up multi-threaded ordering by about 10%.	2022-08-03 21:17:02 +01:00
Geza Lore	b864f5f5ba	V3Partition: use static_cast with LogicMTaskVertex dynamic_cast is not free, and the mtask graph contains only LogicMTaskVertex vertices, use static_cast instead for some speedup.	2022-08-03 17:05:01 +01:00
Geza Lore	f9f66d787e	Fix integer overflow in V3Unroll (#3451 )	2022-08-03 09:41:30 +01:00
Geza Lore	bd211c87aa	astgen: split 'visit' method declarations from definitions Add definitions to V3Ast.cpp, and use static_cast. This fixes a lot of clang-tidy noise.	2022-08-02 17:53:19 +01:00
Geza Lore	6fc25dae9e	Fix clang-tidy warnings (#3522 )	2022-08-02 15:58:48 +01:00
Kamil Rakoczy	cfb6fd8b34	Reduce max RSS usage (#3483 ) By constant folding nodes earlier in V3Expand, we can save some max RSS on large designs.	2022-08-02 13:36:14 +01:00
Geza Lore	39d1a62f9e	Fix change detection on unpacked arrays Expand array assignment when creating the trigger, as V3Expand might mangle it otherwise.	2022-08-02 13:01:41 +01:00
Geza Lore	ba66fa7200	Merge branch 'master' into develop-v5	2022-08-02 11:16:35 +01:00

1 2 3 4 5 ...

3638 Commits