Instead of carrying around MTask affinity from scheduling, compute it in
V3VariableOrder (where it is used), by tracing through the code. This
simplifies some code and has the benefit of handling variables
introduced after scheduling. It's worth a few % speed at run-time, and
the new implementation of V3VariableOrder is slightly more efficient,
though the speed/space is still dominated by the TSP sort.
A separate V3VariableOrder pass is now used to order module variables
before Emit. All variables are now ordered together, without
consideration for whether they are ports, signals form the design, or
additional internal variables added by Verilator (which used to be
ordered and emitted as separate groups in Emit). For single threaded
models, this is performance neutral. For multi-threaded models, the
MTask affinity based sorting was slightly modified, so variables with no
MTask affinity are emitted last, otherwise the MTask affinity sets are
sorted using the TSP sorter as before, but again, ports, signals, and
internal variables are not differentiated. This yields a 2%+ speedup for
the multithreaded model on OpenTitan.