verilator

Commit Graph

Author	SHA1	Message	Date
Geza Lore	9ac64d0b92	Improve performance of MTask coarsening Various optimizations to speed up MTasks coarsening (which is the long pole in the multi-threaded scheduling of very large designs). The biggest impact ones: - Use efficient hand written Pairing Heaps for implementing priority queues and the scoreboard, instead of the old SortByValueMap. This helps us avoid having to sort a lot of merge candidates that we will never actually consider and helps a lot in performance. - Remove unnecessary associative containers and store data structures (the heap nodes in particular) directly in the object they relate to. This eliminates a huge amount of lookups and helps a lot in performance. - Distribute storage for SiblingMC instances into the LogicMTask instances, and combine with the sibling maps. This again eliminates hash table lookups and makes storage structures smaller. - Remove some now bidirectional edge maps, keep only the forward map. There are also some other smaller optimizations: - Replaced more unnecessary dynamic_casts with static_casts - Templated some functions/classes to reduce the number of static branches in loops. - Improves sorting of edges for sibling candidate creation - Various micro-optimizations here and there This speeds up MTask coarsening by 3.8x on a large design, which translates to a 2.5x speedup of the ordering pass in multi-threaded mode. (Combined with the earlier optimizations, ordering is now 3x faster.) Due to the elimination of a lot of the auxiliary data structures, and ensuring a minimal size for the necessary ones, memory consumption of the MTask coarsening is also reduced (measured up to 4.4x reduction though the accuracy of this is low). The algorithm is identical except for minor alterations of the order some candidates are added or removed, this can cause perturbation in the output due to tied scores being broken based on IDs.	2022-08-20 21:18:50 +01:00
Geza Lore	4d81eb021d	Revert "Improve performance of MTask coarsening" This reverts commit `83475008d9`.	2022-08-19 18:03:45 +01:00
Geza Lore	83475008d9	Improve performance of MTask coarsening Various optimizations to speed up MTasks coarsening (which is the long pole in the multi-threaded scheduling of very large designs). The biggest impact ones: - Use efficient hand written Pairing Heaps for implementing priority queues and the scoreboard, instead of the old SortByValueMap. This helps us avoid having to sort a lot of merge candidates that we will never actually consider and helps a lot in performance. - Remove unnecessary associative containers and store data structures (the heap nodes in particular) directly in the object they relate to. This eliminates a huge amount of lookups and helps a lot in performance. - Distribute storage for SiblingMC instances into the LogicMTask instances, and combine with the sibling maps. This again eliminates hash table lookups and makes storage structures smaller. - Remove some now bidirectional edge maps, keep only the forward map. There are also some other smaller optimizations: - Replaced more unnecessary dynamic_casts with static_casts - Templated some functions/classes to reduce the number of static branches in loops. - Improves sorting of edges for sibling candidate creation - Various micro-optimizations here and there This speeds up MTask coarsening by 3.8x on a large design, which translates to a 2.5x speedup of the ordering pass in multi-threaded mode. (Combined with the earlier optimizations, ordering is now 3x faster.) Due to the elimination of a lot of the auxiliary data structures, and ensuring a minimal size for the necessary ones, memory consumption of the MTask coarsening is also reduced (measured up to 4.4x reduction though the accuracy of this is low). The algorithm is identical except for minor alterations of the order some candidates are added or removed, this can cause perturbation in the output due to tied scores being broken based on IDs.	2022-08-19 16:59:20 +01:00
Geza Lore	cd50949a7e	Reuse MTaskEdge instances in MT scheduling Instead of deleting then re-allocating MTaskEdge instances when merging two MTasks, just redirect the edged of the donor MTask to the recipient MTask. This is both faster as it avoids an allocation and a deletion, together with one update of the sibling maps, and also makes the algorithm more stable due to MergeCandidate IDs being stable and allocated up front for all MTaskEdges, before any SiblingMCs are allocated. Perturbations in output are expected as the IDs used to break ties between merge candidates with equal costs are not updated when redirecting an edge (on purpose). The relinking of only one end of the graph edges also perturbs the order in which they are enumerated, which does change candidate opportunities when the number of edges is larger than PART_SIBLING_EDGE_LIMIT. Confirmed output is identical when IDs are updated and edges are updated to appear in their original order.	2022-08-19 14:06:11 +01:00
Geza Lore	b436794773	Add specialized GraphStreamUnordered GraphStreamUnordered used to be GraphStream<std::less<const V3GraphVertex*>>, but a lot of performance improvements can be had by a specialized implementation, so added a highly optimized one. This helps a lot with --debug-partition.	2022-08-19 14:06:11 +01:00
Geza Lore	a2792785fe	Add V3GraphVertex::dotRank to add GraphViz ranks to graph dumps This is a simple debugging aid to allow constraining the graph layout via GraphViz rank directives. Note this is not related in any way to the vertex 'rank' attribute used by some of the graph algorithms. No functional change.	2022-05-02 10:27:26 +01:00
Geza Lore	2ba9eb4228	Speed up TSP sort implementation - More efficient comparison by pre-computing sorting keys. - Remove work items in algorithms known to be redundant earlier. This greatly reduces data structure sizes. - Use V3GraphVertex->user() for state tracking instead of unordered_map while both of these are constant time, they do add up. - In `makeMinSpanningTree`, instead of batch inserting outgoing edges of each visited vertex into an ordered set, keep an ordered set of sorted vectors of edges. This reduces the size of the ordered set significantly (it is now O(V) rather than O(E), and as the subject graph is a complete graph, V ~ sqrt(E), so this is a significant gain). - Use a vector + sorting in `perfectMatching` instead of an ordered set. This is faster on large working sets. This yields 3.8x speedup on the variable order pass and overall 14% verilation speed gain on a large design.	2022-01-07 12:05:52 +00:00
Wilson Snyder	ca42be982c	Copyright year update.	2022-01-01 08:26:40 -05:00
Geza Lore	987ce927eb	Remove unused code. No functional change.	2021-11-09 19:46:19 +00:00
Wilson Snyder	3a55600913	Internals: Restyle with C++11 using replacing typedef	2021-03-12 18:10:45 -05:00
Wilson Snyder	be31fdcfe4	Use Google-style-guide header guard naming, to avoid __ prefix.	2021-03-03 21:57:07 -05:00
Wilson Snyder	bd602d0e2d	Copyright year update	2021-01-01 10:29:54 -05:00
Wilson Snyder	c23de458ed	Misc internal coverage cleanups	2020-12-08 08:40:22 -05:00
Wilson Snyder	b6ded59c2b	Internals: Use and enforce class final for ~5% performance boost.	2020-11-18 21:32:16 -05:00
Wilson Snyder	1b0a48ea02	Internals: Use C++11 = default where obvious. No functional change intended.	2020-11-16 19:56:16 -05:00
Wilson Snyder	b67f1f0e94	Fix GCC warnings	2020-08-18 08:10:44 -04:00
Wilson Snyder	78aee6f4e7	C++11: Use sized enums (+4% performance).	2020-08-16 12:05:35 -04:00
Wilson Snyder	034737d2a8	C++11: Use member declaration initalizations (in nodes). No functional change intended.	2020-08-16 11:44:06 -04:00
Wilson Snyder	c0127599df	C++11: Use nullptr. No functional change.	2020-08-16 11:44:05 -04:00
Wilson Snyder	5c966ec510	clang-format many files. No functional change. Use nodist/clang_formatter to reformat files that are now clean.	2020-04-13 22:52:23 -04:00
Wilson Snyder	1ce360ed5b	Add SPDX license identifiers. No functional change.	2020-03-21 11:24:24 -04:00
Wilson Snyder	0aabe6ce00	Internals: Fix cppcheck warning including missing init.	2020-02-03 22:10:29 -05:00
Wilson Snyder	73f5e3f808	Internals: Add missing const. No functional change.	2020-02-02 10:34:29 -05:00
Wilson Snyder	eafed88a6e	Internals: Add assertions. No functional change intended.	2020-01-25 10:19:59 -05:00
Wilson Snyder	a4e8d39932	Spelling fixes	2020-01-24 20:10:44 -05:00
Wilson Snyder	f23fe8fd84	Update copyright year.	2020-01-06 18:05:53 -05:00
Julien Margetts	cafb148a62	Commentary	2019-12-17 18:27:47 -05:00
Wilson Snyder	5811ec07e6	Update URLs to https://verilator.org	2019-11-07 22:33:59 -05:00
Wilson Snyder	771a301f66	Commentary: Remove newlines, upsets some patches. No functional change.	2019-10-04 20:17:11 -04:00
Wilson Snyder	e1e4bde125	Remove old V3ClkGater code	2019-08-27 17:51:06 -04:00
Wilson Snyder	b83b606267	Internals: Detab and fix spacing style issues. No functional change. When diff, recommend using "git diff --ignore-all-space" When merging, recommend using "git merge -Xignore-all-space"	2019-05-19 16:13:13 -04:00
Wilson Snyder	8a4aeddbb0	Copyright year update.	2019-01-03 19:17:22 -05:00
Wilson Snyder	d87b9d25ca	Internals: Cleanup and standardize include order. No functional change intended.	2018-10-14 13:59:40 -04:00
Wilson Snyder	e37dce9d85	Internals: Add new graph algs for future partitioning.	2018-07-15 22:09:27 -04:00
Wilson Snyder	43694ec87c	Continued... Show file and line info when possible on internal graph errors.	2018-07-14 20:44:43 -04:00
Wilson Snyder	cf4bf9b7a5	Show file and line info when possible on internal graph errors.	2018-07-14 18:45:06 -04:00
Wilson Snyder	338ebcd6f0	Internal: Minor style cleanups for next merge. No functional change.	2018-07-14 17:27:05 -04:00
Wilson Snyder	84562f98de	Internals: Add GraphWay methods for future graph algs. No functional change.	2018-07-08 22:01:16 -04:00
Wilson Snyder	39ecfd9900	Internals: rank() public for future optimizers.	2018-06-26 17:57:57 -04:00
Wilson Snyder	4c9c39bd08	Merge from master	2018-06-16 07:32:32 -04:00
Wilson Snyder	6e7f28785e	Internals: Cleanup graph includes. No functional change.	2018-06-15 06:54:03 -04:00
Wilson Snyder	efb2801eeb	Internals: Add orderPreRanked. No functional change.	2018-06-14 20:29:54 -04:00
Wilson Snyder	94e8cf1de9	Internals: Use explicit std:: instead of using namespace std. No functional change intended.	2018-02-01 21:24:41 -05:00
Wilson Snyder	8e65d93d6d	Copyright year update. No functional change.	2018-01-02 18:05:06 -05:00
Wilson Snyder	a579e9273b	Support self-recursive modules, bug659.	2017-11-18 17:42:35 -05:00
Wilson Snyder	52c3031a82	Internals: Rename selfTest, no functional change.	2017-10-30 19:01:58 -04:00
John Coiner	4e98d96755	Internals: Add const's. No functional change intended. Signed-off-by: Wilson Snyder <wsnyder@wsnyder.org>	2017-10-26 18:42:50 -04:00
Wilson Snyder	e6d7e7e329	Version bump	2017-01-15 12:13:13 -05:00
Wilson Snyder	2d0084308d	Internals: Convert AstNUser to non-pointer to avoid NULL call. No functional change intended.	2016-11-27 09:40:12 -05:00
Wilson Snyder	b738d1960a	Copyright year update	2016-01-06 20:36:41 -05:00

1 2

72 Commits