Commit Graph

72 Commits

Author SHA1 Message Date
Geza Lore 9ac64d0b92 Improve performance of MTask coarsening
Various optimizations to speed up MTasks coarsening (which is the long
pole in the multi-threaded scheduling of very large designs).

The biggest impact ones:
- Use efficient hand written Pairing Heaps for implementing priority
  queues and the scoreboard, instead of the old SortByValueMap. This
  helps us avoid having to sort a lot of merge candidates that we will
  never actually consider and helps a lot in performance.
- Remove unnecessary associative containers and store data structures
  (the heap nodes in particular) directly in the object they relate to.
  This eliminates a huge amount of lookups and helps a lot in
  performance.
- Distribute storage for SiblingMC instances into the LogicMTask
  instances, and combine with the sibling maps. This again eliminates
  hash table lookups and makes storage structures smaller.
- Remove some now bidirectional edge maps, keep only the forward map.

There are also some other smaller optimizations:
- Replaced more unnecessary dynamic_casts with static_casts
- Templated some functions/classes to reduce the number of static
  branches in loops.
- Improves sorting of edges for sibling candidate creation
- Various micro-optimizations here and there

This speeds up MTask coarsening by 3.8x on a large design, which
translates to a 2.5x speedup of the ordering pass in multi-threaded
mode. (Combined with the earlier optimizations, ordering is now 3x
faster.)

Due to the elimination of a lot of the auxiliary data structures, and
ensuring a minimal size for the necessary ones, memory consumption of
the MTask coarsening is also reduced (measured up to 4.4x reduction
though the accuracy of this is low).

The algorithm is identical except for minor alterations of the order
some candidates are added or removed, this can cause perturbation in the
output due to tied scores being broken based on IDs.
2022-08-20 21:18:50 +01:00
Geza Lore 4d81eb021d Revert "Improve performance of MTask coarsening"
This reverts commit 83475008d9.
2022-08-19 18:03:45 +01:00
Geza Lore 83475008d9 Improve performance of MTask coarsening
Various optimizations to speed up MTasks coarsening (which is the long
pole in the multi-threaded scheduling of very large designs).

The biggest impact ones:
- Use efficient hand written Pairing Heaps for implementing priority
  queues and the scoreboard, instead of the old SortByValueMap. This
  helps us avoid having to sort a lot of merge candidates that we will
  never actually consider and helps a lot in performance.
- Remove unnecessary associative containers and store data structures
  (the heap nodes in particular) directly in the object they relate to.
  This eliminates a huge amount of lookups and helps a lot in
  performance.
- Distribute storage for SiblingMC instances into the LogicMTask
  instances, and combine with the sibling maps. This again eliminates
  hash table lookups and makes storage structures smaller.
- Remove some now bidirectional edge maps, keep only the forward map.

There are also some other smaller optimizations:
- Replaced more unnecessary dynamic_casts with static_casts
- Templated some functions/classes to reduce the number of static
  branches in loops.
- Improves sorting of edges for sibling candidate creation
- Various micro-optimizations here and there

This speeds up MTask coarsening by 3.8x on a large design, which
translates to a 2.5x speedup of the ordering pass in multi-threaded
mode. (Combined with the earlier optimizations, ordering is now 3x
faster.)

Due to the elimination of a lot of the auxiliary data structures, and
ensuring a minimal size for the necessary ones, memory consumption of
the MTask coarsening is also reduced (measured up to 4.4x reduction
though the accuracy of this is low).

The algorithm is identical except for minor alterations of the order
some candidates are added or removed, this can cause perturbation in the
output due to tied scores being broken based on IDs.
2022-08-19 16:59:20 +01:00
Geza Lore cd50949a7e Reuse MTaskEdge instances in MT scheduling
Instead of deleting then re-allocating MTaskEdge instances when merging
two MTasks, just redirect the edged of the donor MTask to the recipient
MTask. This is both faster as it avoids an allocation and a deletion,
together with one update of the sibling maps, and also makes the
algorithm more stable due to MergeCandidate IDs being stable and
allocated up front for all MTaskEdges, before any SiblingMCs are
allocated.

Perturbations in output are expected as the IDs used to break ties
between merge candidates with equal costs are not updated when
redirecting an edge (on purpose). The relinking of only one end of the
graph edges also perturbs the order in which they are enumerated, which
does change candidate opportunities when the number of edges is larger
than PART_SIBLING_EDGE_LIMIT. Confirmed output is identical when
IDs are updated and edges are updated to appear in their original order.
2022-08-19 14:06:11 +01:00
Geza Lore b436794773 Add specialized GraphStreamUnordered
GraphStreamUnordered used to be GraphStream<std::less<const
V3GraphVertex*>>, but a lot of performance improvements can be had by a
specialized implementation, so added a highly optimized one. This helps
a lot with --debug-partition.
2022-08-19 14:06:11 +01:00
Geza Lore a2792785fe Add V3GraphVertex::dotRank to add GraphViz ranks to graph dumps
This is a simple debugging aid to allow constraining the graph layout
via GraphViz rank directives. Note this is not related in any way to the
vertex 'rank' attribute used by some of the graph algorithms.

No functional change.
2022-05-02 10:27:26 +01:00
Geza Lore 2ba9eb4228 Speed up TSP sort implementation
- More efficient comparison by pre-computing sorting keys.
- Remove work items in algorithms known to be redundant earlier.
  This greatly reduces data structure sizes.
- Use V3GraphVertex->user() for state tracking instead of unordered_map
  while both of these are constant time, they do add up.
- In `makeMinSpanningTree`, instead of batch inserting outgoing edges of
  each visited vertex into an ordered set, keep an ordered set of sorted
  vectors of edges. This reduces the size of the ordered set
  significantly (it is now O(V) rather than O(E), and as the subject
  graph is a complete graph, V ~ sqrt(E), so this is a significant gain).
- Use a vector + sorting in `perfectMatching` instead of an ordered set.
  This is faster on large working sets.

This yields 3.8x speedup on the variable order pass and overall 14%
verilation speed gain on a large design.
2022-01-07 12:05:52 +00:00
Wilson Snyder ca42be982c Copyright year update. 2022-01-01 08:26:40 -05:00
Geza Lore 987ce927eb Remove unused code. No functional change. 2021-11-09 19:46:19 +00:00
Wilson Snyder 3a55600913 Internals: Restyle with C++11 using replacing typedef 2021-03-12 18:10:45 -05:00
Wilson Snyder be31fdcfe4 Use Google-style-guide header guard naming, to avoid __ prefix. 2021-03-03 21:57:07 -05:00
Wilson Snyder bd602d0e2d Copyright year update 2021-01-01 10:29:54 -05:00
Wilson Snyder c23de458ed Misc internal coverage cleanups 2020-12-08 08:40:22 -05:00
Wilson Snyder b6ded59c2b Internals: Use and enforce class final for ~5% performance boost. 2020-11-18 21:32:16 -05:00
Wilson Snyder 1b0a48ea02 Internals: Use C++11 = default where obvious. No functional change intended. 2020-11-16 19:56:16 -05:00
Wilson Snyder b67f1f0e94 Fix GCC warnings 2020-08-18 08:10:44 -04:00
Wilson Snyder 78aee6f4e7 C++11: Use sized enums (+4% performance). 2020-08-16 12:05:35 -04:00
Wilson Snyder 034737d2a8 C++11: Use member declaration initalizations (in nodes). No functional change intended. 2020-08-16 11:44:06 -04:00
Wilson Snyder c0127599df C++11: Use nullptr. No functional change. 2020-08-16 11:44:05 -04:00
Wilson Snyder 5c966ec510 clang-format many files. No functional change.
Use nodist/clang_formatter to reformat files that are now clean.
2020-04-13 22:52:23 -04:00
Wilson Snyder 1ce360ed5b Add SPDX license identifiers. No functional change. 2020-03-21 11:24:24 -04:00
Wilson Snyder 0aabe6ce00 Internals: Fix cppcheck warning including missing init. 2020-02-03 22:10:29 -05:00
Wilson Snyder 73f5e3f808 Internals: Add missing const. No functional change. 2020-02-02 10:34:29 -05:00
Wilson Snyder eafed88a6e Internals: Add assertions. No functional change intended. 2020-01-25 10:19:59 -05:00
Wilson Snyder a4e8d39932 Spelling fixes 2020-01-24 20:10:44 -05:00
Wilson Snyder f23fe8fd84 Update copyright year. 2020-01-06 18:05:53 -05:00
Julien Margetts cafb148a62 Commentary 2019-12-17 18:27:47 -05:00
Wilson Snyder 5811ec07e6 Update URLs to https://verilator.org 2019-11-07 22:33:59 -05:00
Wilson Snyder 771a301f66 Commentary: Remove newlines, upsets some patches. No functional change. 2019-10-04 20:17:11 -04:00
Wilson Snyder e1e4bde125 Remove old V3ClkGater code 2019-08-27 17:51:06 -04:00
Wilson Snyder b83b606267 Internals: Detab and fix spacing style issues. No functional change.
When diff, recommend using "git diff --ignore-all-space"
When merging, recommend using "git merge -Xignore-all-space"
2019-05-19 16:13:13 -04:00
Wilson Snyder 8a4aeddbb0 Copyright year update. 2019-01-03 19:17:22 -05:00
Wilson Snyder d87b9d25ca Internals: Cleanup and standardize include order. No functional change intended. 2018-10-14 13:59:40 -04:00
Wilson Snyder e37dce9d85 Internals: Add new graph algs for future partitioning. 2018-07-15 22:09:27 -04:00
Wilson Snyder 43694ec87c Continued... Show file and line info when possible on internal graph errors. 2018-07-14 20:44:43 -04:00
Wilson Snyder cf4bf9b7a5 Show file and line info when possible on internal graph errors. 2018-07-14 18:45:06 -04:00
Wilson Snyder 338ebcd6f0 Internal: Minor style cleanups for next merge. No functional change. 2018-07-14 17:27:05 -04:00
Wilson Snyder 84562f98de Internals: Add GraphWay methods for future graph algs. No functional change. 2018-07-08 22:01:16 -04:00
Wilson Snyder 39ecfd9900 Internals: rank() public for future optimizers. 2018-06-26 17:57:57 -04:00
Wilson Snyder 4c9c39bd08 Merge from master 2018-06-16 07:32:32 -04:00
Wilson Snyder 6e7f28785e Internals: Cleanup graph includes. No functional change. 2018-06-15 06:54:03 -04:00
Wilson Snyder efb2801eeb Internals: Add orderPreRanked. No functional change. 2018-06-14 20:29:54 -04:00
Wilson Snyder 94e8cf1de9 Internals: Use explicit std:: instead of using namespace std. No functional change intended. 2018-02-01 21:24:41 -05:00
Wilson Snyder 8e65d93d6d Copyright year update. No functional change. 2018-01-02 18:05:06 -05:00
Wilson Snyder a579e9273b Support self-recursive modules, bug659. 2017-11-18 17:42:35 -05:00
Wilson Snyder 52c3031a82 Internals: Rename selfTest, no functional change. 2017-10-30 19:01:58 -04:00
John Coiner 4e98d96755 Internals: Add const's. No functional change intended.
Signed-off-by: Wilson Snyder <wsnyder@wsnyder.org>
2017-10-26 18:42:50 -04:00
Wilson Snyder e6d7e7e329 Version bump 2017-01-15 12:13:13 -05:00
Wilson Snyder 2d0084308d Internals: Convert AstNUser to non-pointer to avoid NULL call. No functional change intended. 2016-11-27 09:40:12 -05:00
Wilson Snyder b738d1960a Copyright year update 2016-01-06 20:36:41 -05:00