Whether and what to propagate after a release of a part needs to
match the behavior of the full-vector release. Nets need to restore
their driver, and regs need to hold their forced value.
When releasing a net, the release needs to propagate the driven
value. When releasing a variable, the driven value must be set
to the previously forced value.
These opcodes need to return 'bx or 0.0 for the real opcode when
the array index is undefined.
The patch also documents the auto incrementing of the bit
index register done by the %load/avx.p opcode.
The %shiftr/i0 and %shiftl/i0 opcodes are used for some part
selects and if we have a negative shift we want the value to be
padded with 'bx. This patch enhances the two %shift/i0 opcodes
to work with negative shifts and for negative shifts pad with
'bx instead of 'b0.
It also fixes %ix/get/s to use a uint64_t instead of a unsigned
long to avoid problems with sign extension on 32 bit machines.
Two small fixes: Threads should load signal values from signal_value
objects, not signal functors, and the force method should not run
its value through the filter.
Take wires out of the signals/variables and move them into a filter
instead. This is a big shift, and finally starts us on the path to
divide wires out of signals.
We want the entire force/release subsystem to only reference the
vvp_net_t or vvp_net_fil_t objects in a net. This gives us the
latitude to take wire implementations out of the vvp_net_fun classes.
The vvp_fun_force node converts its input to a call to the
force method of the target node. This eliminates the need for
linking a net to a force input of a signal.
The vvp_net_t port 2 was used to implement force behavior, but that
is no longer how we plan to implement force, so remove it from the
implementation of signal nodes. This currently breaks much of the
force/release functionality, but we'll get it back by other means.
This patch fixes the three %assign/v0/x1 operators to correctly
notice that the select has fallen off the start of the vector
for the case that the negative offset equaled the width.
There is no use implementing the release and deassign methods as
port commands. It's confusing and a waste of vvp_net_t functionality.
It also obscures what needs to be done to more force/release into
the filter object.
the vvp_net.h header file is getting pretty huge. This divides
the obviously separable signal functor code out into its own
header and source files.
Also, fill out the use of the filter member of the vvp_net_t
object. Test the output of the vvp_net_t against the filter.
The out pointer of a vvp_net_t object is going to be a bit more
sophisticated when we redo the handling of net signals. Take a step
towards this rework by making the pointer private and implementing
methods needed to access it.
The Multiword division was not handling some degenerate high
guesses for the intermediate division result guess. The end result
was an assertion. Recover from this case.
(Does the addinb back of bp need to be optimized better?)
This patch adds support for 64 bit non-blocking delays in procedural
code. We fixed the procedural delay operator (blocking delays) earlier.
This patch mostly mimics what was done there. The continuous assignment
delay operator still needs to be fixed.
This patch adds code to free most of the memory when vvp
finishes. It also adds valgrind hooks to manage the various
memory pools. The functionality is enabled by passing
--with-valgrind to configure. It requires that the
valgrind/memcheck.h header from a recent version of
valgrind be available. It check for the existence of this
file, but not that it is new enough (version 3.1.3 is known
to not work and version 3.4.0 is known to work).
You can still use valgrind when this option is not given,
but you will have memory that is not released and the
memory pools show as a single block.
With this vvp is 100% clean for many of the tests in the
test suite. There are still a few things that need to be
cleaned up, but it should be much easier to find any real
leaks now.
Enabling this causes a negligible increase in run time and
memory. The memory could be a problem for very large
simulations. The increase in run time is only noticeable on
very short simulations where it should not matter.
This patch adds the procedural power function %pow/s for signed
values. This has bit based inputs and outputs, but uses the double
pow() function to calculate the value.
The VVP %join function was incorrectly treating the return from a
non-automatic function as a return from an automatic function in
the case that the non-automatic function result was being used as
a parameter to an automatic function. This patch fixes this error.
This patch fixes a number of problems related to the divide and
modulus operators.
The net version (CA) of modulus did not support a signed version.
Division or modulus of a value wider than the machine word did
not correctly check for division by zero and return 'bx.
Fixed a problem in procedural modulus. The sign of the result is
only dependent on the L-value.
Division or modulus of a signed value that was the same width as
the machine word was creating an incorrect sign mask.
Division of a signed value that would fit into a single machine
word was not checking for division by zero.
Division or modulus of a wide value was always being done as
unsigned.
Added a negative operator for vvp_vector2_t. This made
implementing the signed wide division and modulus easier.
Support arrays of realtime variable arrays and net arrays. This
involved a simple fix to the ivl core parser, proper support in
the code generator, and rework the runtime support in vvp.
This patch splits any VVP net functor that needs to access both
statically and automatically allocated state into two sub-classes,
one for handling operations on statically allocated state, the
other for handling operations on automatically allocated state.
This undoes the increase in run-time memory use introduced when
automatic task/function support was first introduced.
This patch also fixes various issues with event handling in automatic
scopes. Event expressions in automatic scopes may now reference either
statically or automatically allocated variables or arrays, or part
selects or word selects thereof. More complex expressions (e.g.
containing arithmetic or logical operators, function calls, etc.) are
not currently supported.
This patch introduces some error checking for language constructs
that may not reference automatically allocated variables. Further
error checking will follow in a subsequent patch.
Start cleaning up shadowed variables, flagged by turning on -Wshadow.
No intended change in functionality. Patch looks right, and is tested
to compile and run on my machine. YMMV.
This patch adds non-blocking event control for array words.
It also fixes a problem where the word used to put the
calculated delay for a non-blocking array assignment was
not being released. It also fixes the non-blocking array
assignments to correctly handle off the end/beginning part
selects.
Since some event control assignments can be skipped we need an
event control clear so that future %evctl statements do not fail
their assert. This patch adds %evctl/c and uses it in the compiler
as appropriate to keep the event control information in sync.
This patch adds full event control for vectors and parts of a
vector. It also fixes the other non-blocking part select code
to correctly handle a negative offset ([1:-2] of a [4:0] will
have an offset of -2).
This patch pushes the non-blocking event control information to
the code generator. It adds the %evctl statements that are used
to put the event control information into the special thread
event control registers. The signed version (%evctl/s) required
the implementation of %ix/getv/s to load a signed value into
an index register. It then adds %assign/wr/e event control based
non-blocking assignment for real values. It also fixes the other
non-blocking real assignments to use Transport instead of inertial
delays.
Nothing to do with tab width! Eliminates useless
trailing spaces and tabs, and nearly all <space><tab>
pairings. No change to derived files (e.g., .vvp),
non-master files (e.g., lxt2_write.c) or the new tgt-vhdl
directory.
Low priority, simple entropy reduction. Please apply
unless it deletes some steganographic content you want
to keep.
This patch causes a thread that is created to evaluate a function
to be executed immediately. The parent thread is resumed when the
function thread terminates.
Logical (in)equality needs to look at all the bits of both operands,
and cannot short circuit the test unless defined bits differ. If there
are undefined bits, the equality is undefined at that point, but return
x only if there are not other bits that make the results clearly
unequal.
This patch updates the %cvt/vr command to use the new double to vector
constructor. This allows the resulting bit pattern to be larger than
a long. The old method was producing incorrect results without a
warning for large bit values.
The schedule_assign_plucked_vector is a better way to implement the
schedule_assign_vector, or at least no worse, so remove the now
redundent schedule_assign_vector.
It is legal (though worthy of a warning, I think) for the part select
of an l-value to me out of bounds, so replace the error message with
a warning, and generate the appropriate code. In the process, clean
up some of the code for signal l-values to divide out the various kinds
of processing that can be done. This cleans things up a bit.
The load-and-add for vectors %load/vp0/s can be combined with the
load-and-add for array words, and the %load/avp0/s added to round
out the combinations. This can make for fewer instructions when
words are padded in arithmetic expressions.
The %load/vp0 instruction adds a signed value to the signal value being
loaded, but it doesn't allow for a signed source vector. Add the
%load/vp0/s instruction that pads the loaded vector, and add the code
generator details to properly use it.
The vvp_net_t objects are never deleted, so overload the new operator
to do a more space efficient permanent allocation.
The %assign/v instruction copied the vvp_vector4_t object needlessly
on its way to the scheduler. Eliminate that duplication.(cherry picked from commit d0f303463d)
The %load/v instruction was doing some spurious resizes of the vector
that comes from the signal. Eliminate those resizes that can be
removed, and optimize some that remain.
The multiply runs does not need to do all the combinations of digit
products, because the higher ones cannot add into the result. Fix the
iteration to limit the scan.
Clarify that operands are typically 32bits, and have the code generator
make better use of this.
Also improve the %movi implementation to work well with marger vectors.
Add the %andi instruction to use immediate operands.
Use high radix long division to take advantage of the divide hardware
of the host computer. It looks brute force at first glance, but since
it is using the optimized arithmetic of the host processor, it is much
faster then implementing "fast" algorithms the hard way.
This instruction adds an integer value to the value being loaded. This
optimization uses subarrays instead of the += operator. This is faster
because the value is best loaded into the vector as a subarray anyhow.
When processing wide vectors of these operations, it pays to process
them as vectors. This improves run-time performance. Have the run time
select vectorized or not based on the vector width.
Improve vvp_vector4_t methods copy_bits and the part selecting constructor
to make better use of vector words. Eliminate bit-by-bit processing by
these methods to take advantage of host processor words.
Improve vthread_bits_to_vector to use these improved methods and Update
the %load/av and %set/v instructions to take advantage of these changes.
These instructions can take advantage of the much optimized
vector_to_array function to do their arithmetic work quickly and
punt on X very quickly if needed. This helps some benchmarks.
The abs() function needs to be able to turn -0.0 into 0.0. This proved
to be too clunky (and perhaps impossible) to do with tests and jumps,
so add an %abs/wr opcode to do it using fabs().
The min/max functions need to take special care with the handling
of NaN operands. These matter, so generate the extra code to handle
them.
The %sub instruction didn't have the efficent implementation that
the %add instructions used. Update subtraction to use the array
method, so that it gets the same performance benefits.
The vvp_vector4_t often receives the results of vector arithmetic.
Add an optimized method for setting that data into the vector. Take
into account that arithmetic results have no X/Z bits, etc.
Remove dependencies on vvp_bit4_encoding outside of the vvp_net
core types. The table_functor_s class was the worst offfender and
was barely used, so it is now removed completely. There are a few
opcodes in vhtread.cc that also make vvvp_bit4_t encoding
assumptions (and used casts) and those have been fixed. There
were also various VPI interface functions that are fixed.
This patch fixes deassign to allow it to unlink from a driver.
It also zeros the cassign_link and force_link pointers after
they have been unlinked. Not doing this will cause an assert
if deassign/release are called multiple times (variable only).
This patch adds the ability to assign/deassign a bit or part select.
It also cleans up the code and fixes some problem in the forcing of
strength aware nets.
This patch adds a %assign/av/d opcode. This is a version of %assign/av
that allows a delay expression. Ultimately this allows a dynamically
indexed array to have a delay expression (non-constant delay value).
Threads used to be deleted when they finished processing code.
The problem with this is that some of the code could be
rescheduled to run at rosync ($strobe, etc.). This allowed the
thread data the code depended on to be reaped too soon. This
patch uses a new queue to schedule thread deletion. The queue
is processed after rosync has finished.
This patch adds functionality to do a bit or part select release
when a constant value is forced to the net/register. It also adds an
error message when the user tries to force a signal to a bit/part
select. This is not currently handled by the run time, so is now
caught in the compiler (tgt-vvp). Where when this functionality is
needed, it will be easy to know what to do instead of trying to track
down some odd runtime functionality.
What this all means is that you can force a signal to an entire
signal or you can force a constant to any part of a signal (bit,
part or entire) and release any of the above. Technically the
release of a constant value does not have to match the force.
The runtime verifies that if you are releasing a signal driver
it is being done as a full release. I don't see an easy way to
check this in the compiler.
To fix the signal deficiencies we need to rework the force_link
code to allow multiple drivers and partial unlinking. Much of
this is in the runtime, but the %force/link operator may also
need to be changed like I did to the %release opcode.
This patch reworks much of the ternary code to short circuit when
possible and supports real values better. It adds a blend operator
for real values that returns 0.0 when the values differ and the value
when they match. This deviates slightly from the standard which
specifies that the value for reals is always 0.0 when the conditional
is 'bx. There are also a couple bug fixes.
These fixes have not been ported to continuous assignments yet.
Ternary operators used at compile time and in procedural assignments
should be complete (short circuit and support real values).
draw_number_bool64() in tgt-vvp/eval_bool.c was using %ix/load to
load immediate values into registers greater than three. The problem
was that of_IX_LOAD() in vvp/vthread.cc was masking off the upper
bits. This was putting the results in the wrong register. This patch
removes the bit masking from of_IX_LOAD() and updates the %ix/load
documentation.
This patch fixes two problems. The first is that thr_check_addr()
was being used inconsistently. It should be passed a real address,
but the resize of the vector should be at least one more than this
address. The extra and unneeded CPU_WORD_BITS was also removed
from the routine.
The second problem involved an invalid memory access in
vvp_vector4_t::set_vec() when the vector being copied was an integer
multiple of the machine word width. Under this condition there would
be no remaining bits that needed to be copied but the routine was always
trying to copying some remaining bits. This code is now only executed
when there is a remainder.
Neither of these appear to be causing runtime problems. The second one
was found with valgrind. The first were found while tracking down the
second problem.
This patch adds the power operator for unsigned bit based values
in a continuous assignment. It also refactors the power code for
normal expressions and continuous assignments.
This patch adds bit based power support to normal expressions.
It also pushes the constant unsigned bit based calculation to
the runtime until the bit based method can be copied to the
compiler. Continuous assignments also need to use this type
of calculation.
This patch adds a new opcode %load/avp0 that is used to load a
word from an array and add a value to it. %load/vp0 was
changed/fixed to do the summation at the result width not the
vector width. This allows small vectors to index large arrays with
an offset. A few errors in the opcodes.txt file were also fixed.
The %ix/getv instruction loads an integer register directly from
a signal vector. This is an optimization that an index register
is loaded from an expression is only a signal. It avoids the thread
vector space.
Where and expression is an immediate value added to a signal value,
it is possible to optimize them to a single instruction that combines
the load with an add at the same time.
When nets are forced by non-constant expressions, the value is linked
to the destination net through a force_link. This patch adds the code
needed to unlink a force/link so that release and new force/links can
work. This fixes pr1735836.
Signed-off-by: Stephen Williams <steve@icarus.com>
signed compare in proceedural code was comparing the absolute
value if both operands were negative. Wrong!
Signed-off-by: Stephen Williams <steve@icarus.com>
Implement compare-immediate instructions and generate code to use
these new instructions to improve runtime performance.
Signed-off-by: Stephen Williams <steve@icarus.com>
Real value are vector width of 1, fix real literal to reflect this.
fix leaking real registers in code generation for function arguments.
Load of signal should handle conversion from real to vector. Function
arguments, type vector passed a real value, are an example where this
comes up.
Signed-off-by: Stephen Williams <steve@icarus.com>
more general concept of arrays. The NetMemory and NetEMemory
classes are removed from the ivl core program, and the IVL_LPM_RAM
lpm type is removed from the ivl_target API.