verilator

Commit Graph

Author	SHA1	Message	Date
Geza Lore	875361d7ce	V3Partition: Reduce working set size of PartContraction (#3587 ) This yields an additional 25% speedup of MT scheduling.	2022-09-01 16:29:40 +01:00
Geza Lore	c0f9b0d8f6	V3Partition: Refactor initialization of MTask dependencies No functional change	2022-08-31 16:54:04 +01:00
Geza Lore	505bba14eb	Improve PartFixDataHazards for clarity and speed. - Use modern C++ - Implement OrderLogicVertex->LogicMTask map with OrderLogicVertex::userp(), insteas of std::unordered_map - Simplify data structures - Simplify code and assert properties No functional change.	2022-08-31 16:52:05 +01:00
Geza Lore	ebbe24966c	Remove unnecessary virtual methods	2022-08-31 16:52:05 +01:00
Geza Lore	881c3f6e40	Minor optimization of PartContraction Remove rarely used debug code from initialization loop.	2022-08-31 16:52:05 +01:00
Geza Lore	5c356a4680	Merge branch 'master' into develop-v5	2022-08-22 14:32:06 +01:00
Geza Lore	9ac64d0b92	Improve performance of MTask coarsening Various optimizations to speed up MTasks coarsening (which is the long pole in the multi-threaded scheduling of very large designs). The biggest impact ones: - Use efficient hand written Pairing Heaps for implementing priority queues and the scoreboard, instead of the old SortByValueMap. This helps us avoid having to sort a lot of merge candidates that we will never actually consider and helps a lot in performance. - Remove unnecessary associative containers and store data structures (the heap nodes in particular) directly in the object they relate to. This eliminates a huge amount of lookups and helps a lot in performance. - Distribute storage for SiblingMC instances into the LogicMTask instances, and combine with the sibling maps. This again eliminates hash table lookups and makes storage structures smaller. - Remove some now bidirectional edge maps, keep only the forward map. There are also some other smaller optimizations: - Replaced more unnecessary dynamic_casts with static_casts - Templated some functions/classes to reduce the number of static branches in loops. - Improves sorting of edges for sibling candidate creation - Various micro-optimizations here and there This speeds up MTask coarsening by 3.8x on a large design, which translates to a 2.5x speedup of the ordering pass in multi-threaded mode. (Combined with the earlier optimizations, ordering is now 3x faster.) Due to the elimination of a lot of the auxiliary data structures, and ensuring a minimal size for the necessary ones, memory consumption of the MTask coarsening is also reduced (measured up to 4.4x reduction though the accuracy of this is low). The algorithm is identical except for minor alterations of the order some candidates are added or removed, this can cause perturbation in the output due to tied scores being broken based on IDs.	2022-08-20 21:18:50 +01:00
Wilson Snyder	ebb37b0156	Merge branch 'master' into develop-v5	2022-08-20 14:02:09 -04:00
Geza Lore	4d81eb021d	Revert "Improve performance of MTask coarsening" This reverts commit `83475008d9`.	2022-08-19 18:03:45 +01:00
Geza Lore	83475008d9	Improve performance of MTask coarsening Various optimizations to speed up MTasks coarsening (which is the long pole in the multi-threaded scheduling of very large designs). The biggest impact ones: - Use efficient hand written Pairing Heaps for implementing priority queues and the scoreboard, instead of the old SortByValueMap. This helps us avoid having to sort a lot of merge candidates that we will never actually consider and helps a lot in performance. - Remove unnecessary associative containers and store data structures (the heap nodes in particular) directly in the object they relate to. This eliminates a huge amount of lookups and helps a lot in performance. - Distribute storage for SiblingMC instances into the LogicMTask instances, and combine with the sibling maps. This again eliminates hash table lookups and makes storage structures smaller. - Remove some now bidirectional edge maps, keep only the forward map. There are also some other smaller optimizations: - Replaced more unnecessary dynamic_casts with static_casts - Templated some functions/classes to reduce the number of static branches in loops. - Improves sorting of edges for sibling candidate creation - Various micro-optimizations here and there This speeds up MTask coarsening by 3.8x on a large design, which translates to a 2.5x speedup of the ordering pass in multi-threaded mode. (Combined with the earlier optimizations, ordering is now 3x faster.) Due to the elimination of a lot of the auxiliary data structures, and ensuring a minimal size for the necessary ones, memory consumption of the MTask coarsening is also reduced (measured up to 4.4x reduction though the accuracy of this is low). The algorithm is identical except for minor alterations of the order some candidates are added or removed, this can cause perturbation in the output due to tied scores being broken based on IDs.	2022-08-19 16:59:20 +01:00
Geza Lore	03ac7ad730	Make PartPropagateCp specific to the MTask graph While keeping the client code abstract in PartPropagateCp is nice for testing, there is performance to be had removing the abstraction. As this code dominates in scheduling large designs, we eliminate the abstraction and re-work the testing to use the actual LogicMTask and MTaskEdge graph types. No functional change intended.	2022-08-19 14:06:11 +01:00
Geza Lore	cd50949a7e	Reuse MTaskEdge instances in MT scheduling Instead of deleting then re-allocating MTaskEdge instances when merging two MTasks, just redirect the edged of the donor MTask to the recipient MTask. This is both faster as it avoids an allocation and a deletion, together with one update of the sibling maps, and also makes the algorithm more stable due to MergeCandidate IDs being stable and allocated up front for all MTaskEdges, before any SiblingMCs are allocated. Perturbations in output are expected as the IDs used to break ties between merge candidates with equal costs are not updated when redirecting an edge (on purpose). The relinking of only one end of the graph edges also perturbs the order in which they are enumerated, which does change candidate opportunities when the number of edges is larger than PART_SIBLING_EDGE_LIMIT. Confirmed output is identical when IDs are updated and edges are updated to appear in their original order.	2022-08-19 14:06:11 +01:00
Geza Lore	f0040c7b9a	Remove reliance on pointer comparison in MT scheduling The critical path propagation used to rely on a pointer comparison to break equal scoring critical path updates. Use the corresponding mtask ids instead, which is deterministic across invocations.	2022-08-19 14:06:11 +01:00
Geza Lore	f8a0389e73	Do not use stepCost when gathering sibling merge candidates siblingPairFromRelatives gathers neighbours of a vertex, and sorts them. It then takes the N best nodes, and creates sibling merge candidates from them. We now use the unadjusted cost instead of the step cost of the vertices when sorting. This is both faster as we need not do the log-space rounding to compute stepCost, and will also make similar but yet cheaper nodes appear closer to the front as we don't lose precision in rounding, hence they are more likely to be entered as merge candidates. Note that when creating the merge candidate, we still use the stepCost, so it's purpose of reducing the propagation of critical path updates is maintained in full. In summary, this should make both Verilator and the generated model very slightly faster, at least in theory, and I have observed minor improvement in places.	2022-08-19 14:06:11 +01:00
Geza Lore	c266739e9f	Merge branch 'master' into develop-v5	2022-08-05 12:17:57 +01:00
Geza Lore	96a4b3e5a5	Update clang-format config and apply - Regroup and sort #include directives (like we used to, but automatic) - Set AlwaysBreakTemplateDeclarations to true	2022-08-05 12:00:24 +01:00
Geza Lore	7403226a97	Merge branch 'master' into develop-v5	2022-08-04 10:03:38 +01:00
Geza Lore	fac8e76923	Rework SortByValueMap for better performance Keep a single std::set of key/value pairs, and a single unordered_map from key to iterators into the set. Also improve some of the accessing mechanisms using modern C++. This speeds up multi-threaded ordering by about 10%.	2022-08-03 21:17:02 +01:00
Geza Lore	b864f5f5ba	V3Partition: use static_cast with LogicMTaskVertex dynamic_cast is not free, and the mtask graph contains only LogicMTaskVertex vertices, use static_cast instead for some speedup.	2022-08-03 17:05:01 +01:00
Wilson Snyder	4859f5e1fa	Merge branch 'master' into develop-v5	2022-07-30 10:26:16 -04:00
Wilson Snyder	b9d7819faa	Internals: Fix some cppcheck issues. Some dump functions fixed.	2022-07-30 10:01:39 -04:00
Geza Lore	582da6df9a	Merge branch 'master' into develop-v5	2022-07-14 10:08:52 +01:00
Geza Lore	87f1e06c41	Small algorithmic improvement of PartContraction::siblingPairFromRelatives Use std::partial_sort for the non-exhaustive case. This is O(n) instead of O(nlog(n)) in the size of the candidate list being sorted. (It actually is O(nlog(k)), but k is constant 6 in the non-exhaustive case).	2022-07-12 19:10:01 +01:00
Geza Lore	7e8bafd217	Remove static data use from PartContraction::siblingPairFromRelatives Use std::sort with lambda rather than qsort with static function and static data. Verilation performance neutral.	2022-07-12 19:09:40 +01:00
Geza Lore	282887d9c6	Fix code coverage holes Fixes #3422	2022-05-16 21:22:21 +01:00
Geza Lore	599d23697d	IEEE compliant scheduler (#3384 ) This is a major re-design of the way code is scheduled in Verilator, with the goal of properly supporting the Active and NBA regions of the SystemVerilog scheduling model, as defined in IEEE 1800-2017 chapter 4. With this change, all internally generated clocks should simulate correctly, and there should be no more need for the `clock_enable` and `clocker` attributes for correctness in the absence of Verilator generated library models (`--lib-create`). Details of the new scheduling model and algorithm are provided in docs/internals.rst. Implements #3278	2022-05-15 16:03:32 +01:00
HungMingWu	880a9be3b1	Internal: Add C++20ish reverse_view for range loops. No functional change (#3388 ). Signed-off-by: HungMingWu <u9089000@gmail.com>	2022-04-18 13:03:56 -04:00
Geza Lore	fbd568dc47	Prep for multiple AstExecGraph. No functional change.	2022-04-10 12:00:17 +01:00
Wilson Snyder	e02f97854c	Deprecate 'vluint64_t' and similar types (#3255 ).	2022-03-27 15:27:40 -04:00
Geza Lore	b1b5b5dfe2	Improve run-time profiling The --prof-threads option has been split into two independent options: 1. --prof-exec, for collecting verilator_gantt and other execution related profiling data, and 2. --prof-pgo, for collecting data needed for PGO The implementation of execution profiling is extricated from VlThreadPool and is now a separate class VlExecutionProfiler. This means --prof-exec can now be used for single-threaded models (though it does not measure a lot of things just yet). For consistency VerilatedProfiler is renamed VlPgoProfiler. Both VlExecutionProfiler and VlPgoProfiler are in verilated_profiler.{h/cpp}, but can be used completely independently. Also re-worked the execution profile format so it now only emits events without holding onto any temporaries. This is in preparation for some future optimizations that would be hindered by the introduction of function locals via AstText. Also removed the Barrier event. Clearing the profile buffers is not notably more expensive as the profiling records are trivially destructible.	2022-03-27 15:57:30 +02:00
Wilson Snyder	e6857df5c6	Internals: Rename Ast on non-node classes (#3262 ). No functional change. This commit has the following replacements applied: s/\bAstUserInUseBase\b/VNUserInUseBase/g; s/\bAstAttrType\b/VAttrType/g; s/\bAstBasicDTypeKwd\b/VBasicDTypeKwd/g; s/\bAstDisplayType\b/VDisplayType/g; s/\bAstNDeleter\b/VNDeleter/g; s/\bAstNRelinker\b/VNRelinker/g; s/\bAstNVisitor\b/VNVisitor/g; s/\bAstPragmaType\b/VPragmaType/g; s/\bAstType\b/VNType/g; s/\bAstUser1InUse\b/VNUser1InUse/g; s/\bAstUser2InUse\b/VNUser2InUse/g; s/\bAstUser3InUse\b/VNUser3InUse/g; s/\bAstUser4InUse\b/VNUser4InUse/g; s/\bAstUser5InUse\b/VNUser5InUse/g; s/\bAstVarType\b/VVarType/g;	2022-01-02 14:03:20 -05:00
Wilson Snyder	24a0d2a0c9	Internals: Favor member assignment initialization. No functional change intended.	2022-01-01 11:46:49 -05:00
Wilson Snyder	ca42be982c	Copyright year update.	2022-01-01 08:26:40 -05:00
Wilson Snyder	cd737065f2	Internals: More const. No functional change intended.	2021-11-26 17:55:36 -05:00
Wilson Snyder	010084201a	Internals: Remove dead code.	2021-11-26 16:15:08 -05:00
Wilson Snyder	05e12ab60e	Internals: More const. No functional change intended.	2021-11-26 10:52:45 -05:00
Wilson Snyder	37e3c6da70	Internals: Add more const. No functional change intended.	2021-11-13 13:50:44 -05:00
Geza Lore	e69a8e838d	Improve memory usage of V3Partition. Only performance change intended. (#3192 )	2021-11-05 22:08:54 -04:00
Wilson Snyder	61612582e6	Improve memory usage of V3Partition. Only performance change intended.	2021-11-04 07:39:28 -04:00
Wilson Snyder	da5644211f	Improve memory usage of V3Partition. Only performance change intended.	2021-11-03 22:01:40 -04:00
Wilson Snyder	c1d7bfa617	Internals: Skip some asserts in fastpath partitioning.	2021-11-03 19:19:23 -04:00
Wilson Snyder	c26ce25cea	Internals: Add more const. No functional change.	2021-11-03 17:49:19 -04:00
Wilson Snyder	9029da5ab8	Add profile-guided optmization of mtasks (#3150 ).	2021-09-26 22:51:11 -04:00
Wilson Snyder	c2819923c5	Verilator_gantt now shows the predicted mtask times, eval times, and additional statistics.	2021-09-23 22:59:36 -04:00
Wilson Snyder	68f1432a68	Gantt: Subtract common start in slowpath to reduce collection measurement error.	2021-09-23 19:43:20 -04:00
Wilson Snyder	8ecdc85cf7	Internals: C++11 style cleanups. No functional change.	2021-07-11 18:42:01 -04:00
Wilson Snyder	c7499133b2	Internals: C++11 for bool. No functional change.	2021-07-11 10:42:32 -04:00
Morten Borup Petersen	fd0446f481	Internals: Add .dot graph visualization of ThreadSchedule (#3048 ) * Move MTaskState to ThreadSchedule MTaskState does not concern itself with sandbagging, and thus solely contains information related to the finalized schedule, i.e., completion time, thread ID and next MTask on thread. * Add .dot graph visualization of ThreadSchedule Follow-up to #2779. This commit adds the creation of .dot files - used by GraphViz - to visualize how mtasks are statically scheduled across the set of specified threads. We visualize each thread as a row, with nodes of a row being the mtasks scheduled for the given thread. The width of the mtask nodes are proportional to their cost. MTask dependencies are shown using an edge between the source and sink mtasks.	2021-07-06 07:06:00 -04:00
Geza Lore	708abe0dd1	Introduce model interface class, make $root part or Syms (#3036 ) This patch implements #3032. Verilator creates a module representing the SystemVerilog $root scope (V3LinkLevel::wrapTop). Until now, this was called the "TOP" module, which also acted as the user instantiated model class. Syms used to hold a pointer to this root module, but hold instances of any submodule. This patch renames this root scope module from "TOP" to "$root", and introduces a separate model class which is now an interface class. As the root module is no longer the user interface class, it can now be made an instance of Syms, just like any other submodule. This allows absolute references into the root module to avoid an additional pointer indirection resulting in a potential speedup (about 1.5% on OpenTitan). The model class now also contains all non design specific generated code (e.g.: eval loops, trace config, etc), which additionally simplifies Verilator internals. Please see the updated documentation for the model interface changes.	2021-06-30 16:35:40 +01:00
Wilson Snyder	512fe0a2d1	Internals: Add const. No functional change.	2021-06-20 18:33:13 -04:00

1 2

100 Commits