This patch fixes a number of problems related to the divide and
modulus operators.
The net version (CA) of modulus did not support a signed version.
Division or modulus of a value wider than the machine word did
not correctly check for division by zero and return 'bx.
Fixed a problem in procedural modulus. The sign of the result is
only dependent on the L-value.
Division or modulus of a signed value that was the same width as
the machine word was creating an incorrect sign mask.
Division of a signed value that would fit into a single machine
word was not checking for division by zero.
Division or modulus of a wide value was always being done as
unsigned.
Added a negative operator for vvp_vector2_t. This made
implementing the signed wide division and modulus easier.
Support arrays of realtime variable arrays and net arrays. This
involved a simple fix to the ivl core parser, proper support in
the code generator, and rework the runtime support in vvp.
This patch splits any VVP net functor that needs to access both
statically and automatically allocated state into two sub-classes,
one for handling operations on statically allocated state, the
other for handling operations on automatically allocated state.
This undoes the increase in run-time memory use introduced when
automatic task/function support was first introduced.
This patch also fixes various issues with event handling in automatic
scopes. Event expressions in automatic scopes may now reference either
statically or automatically allocated variables or arrays, or part
selects or word selects thereof. More complex expressions (e.g.
containing arithmetic or logical operators, function calls, etc.) are
not currently supported.
This patch introduces some error checking for language constructs
that may not reference automatically allocated variables. Further
error checking will follow in a subsequent patch.
Start cleaning up shadowed variables, flagged by turning on -Wshadow.
No intended change in functionality. Patch looks right, and is tested
to compile and run on my machine. YMMV.
alloc_instance for real values was just passing a new double
to be added to the context items. The new constructor does
not set the default value so we need to do that manually.
The previous patch (commit 8b0ca902a6)
dealing with the possibilities of (unsigned long) and (vvp_time64_t)
being either the same or different managed to redefine UL_AND_TIME64_DIFF
in the 64-bit case. This does, of course, trigger a compiler warning.
That warning is repeated on every .cc file with a #include "config.h",
which is to say, just about every file.
This patch inverts the sense of the preprocessor conditional, calling
it UL_AND_TIME64_SAME. No more warnings!
A variable that is used to set the delay of a .delay statement
must be scaled to match the local units and for real values
rounded using the precision. This value is then converted to
the simulation precision.
The right shift of vvp_vector2_t needs to
account for and mask off shifted bits. Otherwise
there will be unexpected results after
a vvp_vector2_t::trim method.
Assume that anything that is strength aware already handles a
recv_vec8_pv and make the default function convert the bits
to a vec4 and then call recv_vec4_pv with this new value.
The double to vvp_vector4_t constructor was not using the correct
declaration for the bit words. This worked as long as unsigned and
unsigned long were the same size (usually).
The new real to int conversion was incorrectly setting the
bits for minus infinity to all ones. This is incorrect in a
two's complement encoding where the largest negative number
would be a leading 1 followed by an infinite number of zeros.
This patch makes .part/pv strength aware, resolv vec8_pv
aware. vvp_net_fun_t adds vec8_pv as a virtual function
with an appropriate error default. vvp_fun_signal should
full support vec8_pv (not tested and may not be needed).
This patch adds .cast/int and updates .cast/real to act as a local
(temporary) net and to support either a signed or unsigned input.
The vvp_vector4_t class not can convert an arbitrarily sized double
to a vector value. This removes the restriction of lround().
Also document the new statements.
First, handle the trivial (but possibly common) resolution cases in
inlined code, and only call the complete function for the complicated
cases. Then clean up the complex function for readability, and account
for the constraints that the front-end function established.
Arrays of vvp_vector4_t values redundantly store some fields in every
word. Create a special type that stores vvp_vector4_t values in a form
that does not duplicate the width of all the items. This can save a lot
of space when big memories are simulated.
The schedule_assign_plucked_vector is a better way to implement the
schedule_assign_vector, or at least no worse, so remove the now
redundent schedule_assign_vector.
The vvp_net_fun_t objects, and derived objects, are small, and are
created in large quantities. Tightly pack them into permanently
allocated space in order to save on system allocation overhead, and
thus save overall on memory.
The vvp_net_t objects are never deleted, so overload the new operator
to do a more space efficient permanent allocation.
The %assign/v instruction copied the vvp_vector4_t object needlessly
on its way to the scheduler. Eliminate that duplication.(cherry picked from commit d0f303463d)
The vvp_vector8_t constructor and destructor involve memory allocation
so it is best to pass these objects by reference as much as possible.
Also have the islands take more care not to perform resolution if the
inputs aren't really different.
NOTE: This is a port of commit 2f4e5bf5b6
from the "performance" branch, without the resolver scheduling changes.
This was causing test suite variances with pr1820472.v. It looks like
there might be a race in that program anyhow, but for now leave out the
resolver scheduling changes so that the rest of this commit can go in.
The %load/v instruction was doing some spurious resizes of the vector
that comes from the signal. Eliminate those resizes that can be
removed, and optimize some that remain.
The AND and OR operators for vvp_bit4_t are slightly tweaked to be
lighter and inlinable.
The vvp_vector4_t::set_bit is optimized to do less silly mask fiddling.
When processing wide vectors of these operations, it pays to process
them as vectors. This improves run-time performance. Have the run time
select vectorized or not based on the vector width.
Improve vvp_vector4_t methods copy_bits and the part selecting constructor
to make better use of vector words. Eliminate bit-by-bit processing by
these methods to take advantage of host processor words.
Improve vthread_bits_to_vector to use these improved methods and Update
the %load/av and %set/v instructions to take advantage of these changes.
These instructions can take advantage of the much optimized
vector_to_array function to do their arithmetic work quickly and
punt on X very quickly if needed. This helps some benchmarks.
The vvp_vector4_t often receives the results of vector arithmetic.
Add an optimized method for setting that data into the vector. Take
into account that arithmetic results have no X/Z bits, etc.
On some systems, 1UL<<X will make a mess if X is the size of
an unsigned long. This especially seems to be a problem on i386
systems. Protect those shifts in the vvp_net.cc.
By slightly altering the vvp_bit4_t encoding, a few simple
optimizations become possible. By making Z==2 and X==3, the
conversion from X/Z to X is a simple shift-or, and this can
be used to reduce the size of some of the bit4 operators.
The vvp_vector4_t holds 4-value logic. This patch changes the encoding
of 4-value bits in the vector to use separate A- and B bit vectors,
with the B- vector signaling the A- bits that are not 0/1. This
allows rapid conversion to 2-value logic, and rapid tests for X
and Z values.
This patch adds the ability to assign/deassign a bit or part select.
It also cleans up the code and fixes some problem in the forcing of
strength aware nets.
This patch adds functionality to do a bit or part select release
when a constant value is forced to the net/register. It also adds an
error message when the user tries to force a signal to a bit/part
select. This is not currently handled by the run time, so is now
caught in the compiler (tgt-vvp). Where when this functionality is
needed, it will be easy to know what to do instead of trying to track
down some odd runtime functionality.
What this all means is that you can force a signal to an entire
signal or you can force a constant to any part of a signal (bit,
part or entire) and release any of the above. Technically the
release of a constant value does not have to match the force.
The runtime verifies that if you are releasing a signal driver
it is being done as a full release. I don't see an easy way to
check this in the compiler.
To fix the signal deficiencies we need to rework the force_link
code to allow multiple drivers and partial unlinking. Much of
this is in the runtime, but the %force/link operator may also
need to be changed like I did to the %release opcode.
This patch fixes two problems. The first is that thr_check_addr()
was being used inconsistently. It should be passed a real address,
but the resize of the vector should be at least one more than this
address. The extra and unneeded CPU_WORD_BITS was also removed
from the routine.
The second problem involved an invalid memory access in
vvp_vector4_t::set_vec() when the vector being copied was an integer
multiple of the machine word width. Under this condition there would
be no remaining bits that needed to be copied but the routine was always
trying to copying some remaining bits. This code is now only executed
when there is a remainder.
Neither of these appear to be causing runtime problems. The second one
was found with valgrind. The first were found while tracking down the
second problem.