The vvp_island classes are added, as well as support for tranif nodes
that use this concept. The result is a working implementation for
tranif0 and tranif1.
In the process, the symbol table functions were cleaned up and made
into templates for better type safety, and the vvp_net_ptr_t was
generalized so that it can be used by the branches in the island
implementation.
Also fix up the array handling to use the better symbol table support,
and to remember to clear its own table when linking is done.
The code generator was reading the wrong node of a bi-directional
part select. This happens exclusively with part selects passed to
bi-directional ports, so was rare. The result was that the non-part
selected part may get an incorrect value.
The multiply runs does not need to do all the combinations of digit
products, because the higher ones cannot add into the result. Fix the
iteration to limit the scan.
ivl_switch_scope, ivl_switch_attr_cnt and ivl_switch_attr_val
are non-existent routines and should not be in ivl.def. I also
removed them from ivl_target.h. Cygwin expects that if a routine
is listed in ivl.def that it will find a real implementation.
My previous patch always used an empty file if git was not
available. This patch extends this to use the existing
version.h file if it exists (snapshot, etc.)
The handling of immediate add used to do 16bits at a time. When it went
up to 32bits, the need to work in chunks vanished, but the chunk handling
was still there, this time shifting by 32, which causes problems on 32bit
machines. Simplify the %addi handling to avoid this.
Clarify that operands are typically 32bits, and have the code generator
make better use of this.
Also improve the %movi implementation to work well with marger vectors.
Add the %andi instruction to use immediate operands.
Use high radix long division to take advantage of the divide hardware
of the host computer. It looks brute force at first glance, but since
it is using the optimized arithmetic of the host processor, it is much
faster then implementing "fast" algorithms the hard way.
This instruction adds an integer value to the value being loaded. This
optimization uses subarrays instead of the += operator. This is faster
because the value is best loaded into the vector as a subarray anyhow.
The AND and OR operators for vvp_bit4_t are slightly tweaked to be
lighter and inlinable.
The vvp_vector4_t::set_bit is optimized to do less silly mask fiddling.
This involves defining the API for switches and cleaning up the
elaborated form to match the defined ivl_target API. Also add t-dll
code to support the ivl_switch_t functions, and add stub code that
checks the results.
When processing wide vectors of these operations, it pays to process
them as vectors. This improves run-time performance. Have the run time
select vectorized or not based on the vector width.
Improve vvp_vector4_t methods copy_bits and the part selecting constructor
to make better use of vector words. Eliminate bit-by-bit processing by
these methods to take advantage of host processor words.
Improve vthread_bits_to_vector to use these improved methods and Update
the %load/av and %set/v instructions to take advantage of these changes.
The MinGW system() implementation appears to return the straight
return value instead of the waitpid() like result that more
normal systems return. Because of this just return the system()
result without processing for MinGW compilations.
Older version of the MinGW runtime (pre 3.14) just used the
underlying vsnprintf(). Which has some problems. The 3.14 version
has some nice improvements, but it has a sever bug when processing
"%*.*f", -1, -1, <some_real_value>. Because of this we need to use
the underlying version without the enhancements for now.
snprintf prints %p differently than the other printf routines
so use _snprintf to get consistent results.
Only build the PDF files if both man and ps2pdf exist.
MinGW does not know about the z modifier for %d, %u, etc.
Add some missing Makefile check targets.
These instructions can take advantage of the much optimized
vector_to_array function to do their arithmetic work quickly and
punt on X very quickly if needed. This helps some benchmarks.
If the select of a MUXZ is constant 0 or 1, then we can elide the MUX
in place of a simple BUFZ. This should allow the dead code on the
unused side to be removed as well.
Functions like $monitor need to attach callbacks to array words if
those words are to be monitored. Have the array hold all the callbacks
for words in the array, under the assumption that the monitored words
are sparse.