The goal here is to use as single ordering heuristic (which can be
improved later) within MTasks as we do for serial code ordering. The
heuristic itself is factored out into the new OrderMoveGraphSerializer.
This also yields slightly nicer ordering than the previously use
GraphStream, so we end up with fewer trigger (domain) conditionals in
the MTasks, this can be worth a few percent speedup.
This has the somewhat nice side-effect of reusing OrderMoveGraphVertex
for both serial and parallel mode, so MTaskMoveGraphVertex can be
removed.
Serial mode yields identical output.
There is no strong need to re-map LogicMTask IDs and it just adds extra
processing. Instead we just allocate a separate set of ExecMTask IDs as
they are created, which can also be used as the unique profiling ID as
well. The only effect on the output of this is the change in mtask IDs
emitted, which was fairly arbitrary to begin with.
Wrapping the functions in #4933 broke --prof-exec report as the
predicted MTask times are computed during thread packing, but are
emitted in the wrapping functions.
V3Partition used to contain 2 conceptually separate set of algorithms
- The MTask partitioning/coarsening algorithm used by V3Order. This has
been moved to V3OrderParallel.cpp
- The lowering of AstExecGraph into per thread functions by packing
tasks into threads and creating additional code
(V3Partition::finalize). This has been moved to the new
V3ExecGraph.cpp
This patch is just code movement/rename with minimal fixes required to
do so.