The DFG peephole pass converts all associative trees into right leaning, which is good for simplifying pattern recognition, but can lead to an excessive amount of wide intermediate results being constructed for right leaning concatenations. Add a new pass to balance concatenation trees by trying to: - Create VL_EDATASIZE (32-bit) sub-terms, so words can then be packed easily afterwards - Try to ensure the operands of a concat are roughly the same width within a concatenation tree. This does not yield the shortest tree, but it ensures it has many sub-nodes that are small enough to fit into machine registers. This can eliminate a lot of wide intermediate results, which would need temporaries, and also increases ILP within sub-expressions (assuming the C compiler can't figure that out itself). This is over 2x run-time speedup on the high_perf configuration of VeeR EH2 (which you could arguably also get with -fno-dfg, but oh well). |
||
|---|---|---|
| .. | ||
| t | ||
| .gdbinit | ||
| .gitignore | ||
| CMakeLists.txt | ||
| Makefile | ||
| Makefile_obj | ||
| driver.py | ||
| input.vc | ||
| input.xsim.vc | ||