verilator

Commit Graph

Author	SHA1	Message	Date
Geza Lore	9ac64d0b92	Improve performance of MTask coarsening Various optimizations to speed up MTasks coarsening (which is the long pole in the multi-threaded scheduling of very large designs). The biggest impact ones: - Use efficient hand written Pairing Heaps for implementing priority queues and the scoreboard, instead of the old SortByValueMap. This helps us avoid having to sort a lot of merge candidates that we will never actually consider and helps a lot in performance. - Remove unnecessary associative containers and store data structures (the heap nodes in particular) directly in the object they relate to. This eliminates a huge amount of lookups and helps a lot in performance. - Distribute storage for SiblingMC instances into the LogicMTask instances, and combine with the sibling maps. This again eliminates hash table lookups and makes storage structures smaller. - Remove some now bidirectional edge maps, keep only the forward map. There are also some other smaller optimizations: - Replaced more unnecessary dynamic_casts with static_casts - Templated some functions/classes to reduce the number of static branches in loops. - Improves sorting of edges for sibling candidate creation - Various micro-optimizations here and there This speeds up MTask coarsening by 3.8x on a large design, which translates to a 2.5x speedup of the ordering pass in multi-threaded mode. (Combined with the earlier optimizations, ordering is now 3x faster.) Due to the elimination of a lot of the auxiliary data structures, and ensuring a minimal size for the necessary ones, memory consumption of the MTask coarsening is also reduced (measured up to 4.4x reduction though the accuracy of this is low). The algorithm is identical except for minor alterations of the order some candidates are added or removed, this can cause perturbation in the output due to tied scores being broken based on IDs.	2022-08-20 21:18:50 +01:00
Wilson Snyder	7cc89b8b42	Merge branch 'master' into develop-v5	2022-08-20 14:19:45 -04:00
Wilson Snyder	c6607724cb	Fix clang warning.	2022-08-20 14:19:00 -04:00
Wilson Snyder	ebb37b0156	Merge branch 'master' into develop-v5	2022-08-20 14:02:09 -04:00
Wilson Snyder	90dc04cf93	Add --future0 and --future1 options.	2022-08-20 14:01:13 -04:00
Krzysztof Bieganski	10cf492946	Add support for expressions in event controls (#3550 ) Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>	2022-08-19 20:18:38 +02:00
Geza Lore	4d81eb021d	Revert "Improve performance of MTask coarsening" This reverts commit `83475008d9`.	2022-08-19 18:03:45 +01:00
Geza Lore	83475008d9	Improve performance of MTask coarsening Various optimizations to speed up MTasks coarsening (which is the long pole in the multi-threaded scheduling of very large designs). The biggest impact ones: - Use efficient hand written Pairing Heaps for implementing priority queues and the scoreboard, instead of the old SortByValueMap. This helps us avoid having to sort a lot of merge candidates that we will never actually consider and helps a lot in performance. - Remove unnecessary associative containers and store data structures (the heap nodes in particular) directly in the object they relate to. This eliminates a huge amount of lookups and helps a lot in performance. - Distribute storage for SiblingMC instances into the LogicMTask instances, and combine with the sibling maps. This again eliminates hash table lookups and makes storage structures smaller. - Remove some now bidirectional edge maps, keep only the forward map. There are also some other smaller optimizations: - Replaced more unnecessary dynamic_casts with static_casts - Templated some functions/classes to reduce the number of static branches in loops. - Improves sorting of edges for sibling candidate creation - Various micro-optimizations here and there This speeds up MTask coarsening by 3.8x on a large design, which translates to a 2.5x speedup of the ordering pass in multi-threaded mode. (Combined with the earlier optimizations, ordering is now 3x faster.) Due to the elimination of a lot of the auxiliary data structures, and ensuring a minimal size for the necessary ones, memory consumption of the MTask coarsening is also reduced (measured up to 4.4x reduction though the accuracy of this is low). The algorithm is identical except for minor alterations of the order some candidates are added or removed, this can cause perturbation in the output due to tied scores being broken based on IDs.	2022-08-19 16:59:20 +01:00
Geza Lore	03ac7ad730	Make PartPropagateCp specific to the MTask graph While keeping the client code abstract in PartPropagateCp is nice for testing, there is performance to be had removing the abstraction. As this code dominates in scheduling large designs, we eliminate the abstraction and re-work the testing to use the actual LogicMTask and MTaskEdge graph types. No functional change intended.	2022-08-19 14:06:11 +01:00
Geza Lore	cd50949a7e	Reuse MTaskEdge instances in MT scheduling Instead of deleting then re-allocating MTaskEdge instances when merging two MTasks, just redirect the edged of the donor MTask to the recipient MTask. This is both faster as it avoids an allocation and a deletion, together with one update of the sibling maps, and also makes the algorithm more stable due to MergeCandidate IDs being stable and allocated up front for all MTaskEdges, before any SiblingMCs are allocated. Perturbations in output are expected as the IDs used to break ties between merge candidates with equal costs are not updated when redirecting an edge (on purpose). The relinking of only one end of the graph edges also perturbs the order in which they are enumerated, which does change candidate opportunities when the number of edges is larger than PART_SIBLING_EDGE_LIMIT. Confirmed output is identical when IDs are updated and edges are updated to appear in their original order.	2022-08-19 14:06:11 +01:00
Geza Lore	f0040c7b9a	Remove reliance on pointer comparison in MT scheduling The critical path propagation used to rely on a pointer comparison to break equal scoring critical path updates. Use the corresponding mtask ids instead, which is deterministic across invocations.	2022-08-19 14:06:11 +01:00
Geza Lore	f8a0389e73	Do not use stepCost when gathering sibling merge candidates siblingPairFromRelatives gathers neighbours of a vertex, and sorts them. It then takes the N best nodes, and creates sibling merge candidates from them. We now use the unadjusted cost instead of the step cost of the vertices when sorting. This is both faster as we need not do the log-space rounding to compute stepCost, and will also make similar but yet cheaper nodes appear closer to the front as we don't lose precision in rounding, hence they are more likely to be entered as merge candidates. Note that when creating the merge candidate, we still use the stepCost, so it's purpose of reducing the propagation of critical path updates is maintained in full. In summary, this should make both Verilator and the generated model very slightly faster, at least in theory, and I have observed minor improvement in places.	2022-08-19 14:06:11 +01:00
Geza Lore	b436794773	Add specialized GraphStreamUnordered GraphStreamUnordered used to be GraphStream<std::less<const V3GraphVertex*>>, but a lot of performance improvements can be had by a specialized implementation, so added a highly optimized one. This helps a lot with --debug-partition.	2022-08-19 14:06:11 +01:00
Geza Lore	1404319b28	Merge branch 'master' into develop-v5	2022-08-19 13:39:44 +01:00
Geza Lore	90d22cbec6	Fix `AstNode::exists` return type	2022-08-19 13:22:06 +01:00
Krzysztof Bieganski	33e2acfe61	Fix `AstNode::forall` return type (#3559 ) Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>	2022-08-19 12:33:17 +01:00
Ryszard Rozak	db5fdfb0ee	Fix === with some tristate constants (#3551 ).	2022-08-18 07:03:05 -04:00
Krzysztof Bieganski	951cd73fe0	Handle MemberSel in V3EmitV.cpp (#3555 )	2022-08-18 06:33:45 -04:00
Arkadiusz Kozdra	0eeb40b975	Fix converting subclasses to string (#3552 )	2022-08-17 18:08:43 -04:00
Wilson Snyder	93272c13fd	Tests: Confirm fixed (#181 )	2022-08-15 22:17:36 -04:00
Wilson Snyder	43abaeb055	Tests: Confirm fixed (#485 )	2022-08-15 22:17:17 -04:00
Wilson Snyder	18b9e661c9	Tests: Confirm fixed (#446 )	2022-08-15 22:17:09 -04:00
Wilson Snyder	f435d96241	Fix case statement comparing string literal (#3544 ).	2022-08-15 21:56:09 -04:00
github action	d32e3f042f	Apply 'make format'	2022-08-12 10:56:12 +00:00
Mostafa Gamal	df5f95a5bd	Fix nested default assignment for struct pattern (#3511 ) (#3524 )	2022-08-12 06:55:07 -04:00
Drew Ranck	b0c475205b	Fix void-cast queue pop_front or pop_back (#3542 ) (#3364 ) Fix compile error for queue method usage, if it is the first statement in a block of code, and the return value is not used. Example: > if (foo) > void'(bar.pop_front());	2022-08-12 06:51:25 -04:00
Wilson Snyder	1e2219347e	Internals: Cleanup ifdef, move up not under compilver version ifdef	2022-08-11 17:41:43 -04:00
Wilson Snyder	cbe1b8e266	Fix segfault exporting non-existant package (#3535 ).	2022-08-08 17:53:50 -04:00
Mariusz Glebocki	2b12fe5773	Internals: Construct V3Number with correct type instead of changing it manually. (#3529 )	2022-08-08 08:17:02 -04:00
Geza Lore	a4fd6d38fb	Add operator != to VlWide This is required by VlUnpacked::neq	2022-08-07 13:13:28 +01:00
Yutetsu TAKATSUKASA	d20f22beb1	Fix tristate logic when reading inout port in a module #3399 (#3523 ) * Tests: Add a test to reproduce #3399 * Fix #3399. When reading an inout port in a module, it should refer the original inout port, not the generated MODTEMP.	2022-08-07 21:12:57 +09:00
Wilson Snyder	f4fe10844b	Tests: Fix t_flag_help.pl (#3532 ).	2022-08-07 04:57:59 -04:00
Mariusz Glebocki	122e89ffde	Fix V3Number::isMsbXZ(). (#3530 )	2022-08-05 19:12:52 +01:00
Geza Lore	c266739e9f	Merge branch 'master' into develop-v5	2022-08-05 12:17:57 +01:00
Geza Lore	96a4b3e5a5	Update clang-format config and apply - Regroup and sort #include directives (like we used to, but automatic) - Set AlwaysBreakTemplateDeclarations to true	2022-08-05 12:00:24 +01:00
Geza Lore	7403226a97	Merge branch 'master' into develop-v5	2022-08-04 10:03:38 +01:00
Geza Lore	fac8e76923	Rework SortByValueMap for better performance Keep a single std::set of key/value pairs, and a single unordered_map from key to iterators into the set. Also improve some of the accessing mechanisms using modern C++. This speeds up multi-threaded ordering by about 10%.	2022-08-03 21:17:02 +01:00
Geza Lore	b864f5f5ba	V3Partition: use static_cast with LogicMTaskVertex dynamic_cast is not free, and the mtask graph contains only LogicMTaskVertex vertices, use static_cast instead for some speedup.	2022-08-03 17:05:01 +01:00
Geza Lore	f9f66d787e	Fix integer overflow in V3Unroll (#3451 )	2022-08-03 09:41:30 +01:00
Geza Lore	bd211c87aa	astgen: split 'visit' method declarations from definitions Add definitions to V3Ast.cpp, and use static_cast. This fixes a lot of clang-tidy noise.	2022-08-02 17:53:19 +01:00
Geza Lore	6c33e6e889	Tell clang-tidy .h files are C++ (not C) headers	2022-08-02 17:53:19 +01:00
Geza Lore	6fc25dae9e	Fix clang-tidy warnings (#3522 )	2022-08-02 15:58:48 +01:00
Kamil Rakoczy	cfb6fd8b34	Reduce max RSS usage (#3483 ) By constant folding nodes earlier in V3Expand, we can save some max RSS on large designs.	2022-08-02 13:36:14 +01:00
Geza Lore	39d1a62f9e	Fix change detection on unpacked arrays Expand array assignment when creating the trigger, as V3Expand might mangle it otherwise.	2022-08-02 13:01:41 +01:00
Geza Lore	ba66fa7200	Merge branch 'master' into develop-v5	2022-08-02 11:16:35 +01:00
Geza Lore	cb60663d49	V3Gate: Defer substitutions until required as well Similarly to the earlier patch that defers constant folding on optimized logic, now we also defer the variable substitutions as well. This again eliminates a lot of traversals, and yields another ~10x speedup of V3Gate on a design where V3Gate used to dominate while producing identical results.	2022-08-01 12:54:41 +01:00
Geza Lore	0d2bf23d82	V3Gate: Defer constant folding until required Rather than constant folding each logic block after every substitution, only constant fold updated blocks when re-analysed, or at the end. This removes a lot of invocations of V3Const on large blocks that can be optimized well, and should yield the same result. This speeds up V3Gate by ~4x on a design where V3Gate dominates.	2022-07-31 20:42:04 +01:00
Geza Lore	682a60e325	Cleanup V3Gate, no functional change	2022-07-31 20:07:54 +01:00
Geza Lore	2ab6272cc7	Use AstNode::foreach in V3Gate This yields a little speedup.	2022-07-31 20:05:25 +01:00
Geza Lore	152a6cd886	Improve AstNode::foreach (also exists and forall) Speed improvements: - Use a direct, recursion-free implementation - Improve pre-fetching Functionality: - Support remove/replace of currently iterated node	2022-07-31 19:07:32 +01:00

... 18 19 20 21 22 ...

6219 Commits All Branches Search

6219 Commits

All Branches