The process of inverting and copying can be collapsed into a single
operation that should run a little faster. Also, inverting readily
vectorizes itself. I've also possibly reduced useless vvp_not_fun
iterations.
Also, include some minor tweaks to the vvp_vector8_t handling of
vector copies. This is more to clean up code, although it should
slightly improve performance as well.
The vvp thread word storage had previously been changed to always store
64-bit values, but some instructions still only operate on native long
values. This patch ensures all instructions that modify thread words
support 64-bit values.
For the %mov instruction, implement a vvp_vector4_t::mov method to
manipulate the thread vector directly.
For the %load/v instruction, rework the vec4_value() methods to
avoid creating vvp_vector4_t temporaries, and therefore reduce the
copy overhead.
This patch adds cleanup code that cleans up the memory that is
allocated by the of_EXEC_UFUNC command. This knocks a few more
files off the valgrind list.
Whether and what to propagate after a release of a part needs to
match the behavior of the full-vector release. Nets need to restore
their driver, and regs need to hold their forced value.
filters need to be able to cope with parts of vectors moving through
the net. It makes the most sense to handle every filter as a part-
selected filter.
When releasing a net, the release needs to propagate the driven
value. When releasing a variable, the driven value must be set
to the previously forced value.
Handle the case where the input to a net is a constant. Since nets
do not exist anymore as nodes in their own right, we need to create
a driver to drive a net from a constant.
Handle forward references of net inputs.
Two small fixes: Threads should load signal values from signal_value
objects, not signal functors, and the force method should not run
its value through the filter.
Take wires out of the signals/variables and move them into a filter
instead. This is a big shift, and finally starts us on the path to
divide wires out of signals.
We want the entire force/release subsystem to only reference the
vvp_net_t or vvp_net_fil_t objects in a net. This gives us the
latitude to take wire implementations out of the vvp_net_fun classes.
These methods are type specific, but the code that invokes them
get at them from pointers to filter objects, so it makes sense to
make them abstract methods of the vvp_net_fil_t class.
This is moving towards moving force/release out of the signal
class. The end-game is to remove all of the wire implementation
out of the functor and into the filter. Variables will remain in
the functor.
Move the vvp_vpi_callback to the vvp_net_sig.h header file, and
collapse some useless hierarchy. (Specifically, all callbackable
items are also wordable.)
Move the run_vpi_callback invocation for wires/variables from the
output generator to the newly implemented filter object. This is
starting to get the filter class working.
the vvp_net.h header file is getting pretty huge. This divides
the obviously separable signal functor code out into its own
header and source files.
Also, fill out the use of the filter member of the vvp_net_t
object. Test the output of the vvp_net_t against the filter.
The out pointer of a vvp_net_t object is going to be a bit more
sophisticated when we redo the handling of net signals. Take a step
towards this rework by making the pointer private and implementing
methods needed to access it.
This patch adds code to free most of the memory when vvp
finishes. It also adds valgrind hooks to manage the various
memory pools. The functionality is enabled by passing
--with-valgrind to configure. It requires that the
valgrind/memcheck.h header from a recent version of
valgrind be available. It check for the existence of this
file, but not that it is new enough (version 3.1.3 is known
to not work and version 3.4.0 is known to work).
You can still use valgrind when this option is not given,
but you will have memory that is not released and the
memory pools show as a single block.
With this vvp is 100% clean for many of the tests in the
test suite. There are still a few things that need to be
cleaned up, but it should be much easier to find any real
leaks now.
Enabling this causes a negligible increase in run time and
memory. The memory could be a problem for very large
simulations. The increase in run time is only noticeable on
very short simulations where it should not matter.
Finish cleaning up shadowed variables, flagged by turning on -Wshadow.
No intended change in functionality. Patch looks right, and is tested
to compile and run on my machine. No regressions in the test suite.
This is the end of the simple, coordination-free patches.
The remaining shadows are special cases that will need extra attention.
This patch fixes a number of problems related to the divide and
modulus operators.
The net version (CA) of modulus did not support a signed version.
Division or modulus of a value wider than the machine word did
not correctly check for division by zero and return 'bx.
Fixed a problem in procedural modulus. The sign of the result is
only dependent on the L-value.
Division or modulus of a signed value that was the same width as
the machine word was creating an incorrect sign mask.
Division of a signed value that would fit into a single machine
word was not checking for division by zero.
Division or modulus of a wide value was always being done as
unsigned.
Added a negative operator for vvp_vector2_t. This made
implementing the signed wide division and modulus easier.
Support arrays of realtime variable arrays and net arrays. This
involved a simple fix to the ivl core parser, proper support in
the code generator, and rework the runtime support in vvp.
This patch splits any VVP net functor that needs to access both
statically and automatically allocated state into two sub-classes,
one for handling operations on statically allocated state, the
other for handling operations on automatically allocated state.
This undoes the increase in run-time memory use introduced when
automatic task/function support was first introduced.
This patch also fixes various issues with event handling in automatic
scopes. Event expressions in automatic scopes may now reference either
statically or automatically allocated variables or arrays, or part
selects or word selects thereof. More complex expressions (e.g.
containing arithmetic or logical operators, function calls, etc.) are
not currently supported.
This patch introduces some error checking for language constructs
that may not reference automatically allocated variables. Further
error checking will follow in a subsequent patch.
Start cleaning up shadowed variables, flagged by turning on -Wshadow.
No intended change in functionality. Patch looks right, and is tested
to compile and run on my machine. YMMV.
The previous patch (commit 8b0ca902a6)
dealing with the possibilities of (unsigned long) and (vvp_time64_t)
being either the same or different managed to redefine UL_AND_TIME64_DIFF
in the 64-bit case. This does, of course, trigger a compiler warning.
That warning is repeated on every .cc file with a #include "config.h",
which is to say, just about every file.
This patch inverts the sense of the preprocessor conditional, calling
it UL_AND_TIME64_SAME. No more warnings!
A variable that is used to set the delay of a .delay statement
must be scaled to match the local units and for real values
rounded using the precision. This value is then converted to
the simulation precision.
Assume that anything that is strength aware already handles a
recv_vec8_pv and make the default function convert the bits
to a vec4 and then call recv_vec4_pv with this new value.
This patch makes .part/pv strength aware, resolv vec8_pv
aware. vvp_net_fun_t adds vec8_pv as a virtual function
with an appropriate error default. vvp_fun_signal should
full support vec8_pv (not tested and may not be needed).
This patch adds .cast/int and updates .cast/real to act as a local
(temporary) net and to support either a signed or unsigned input.
The vvp_vector4_t class not can convert an arbitrarily sized double
to a vector value. This removes the restriction of lround().
Also document the new statements.
First, handle the trivial (but possibly common) resolution cases in
inlined code, and only call the complete function for the complicated
cases. Then clean up the complex function for readability, and account
for the constraints that the front-end function established.
Arrays of vvp_vector4_t values redundantly store some fields in every
word. Create a special type that stores vvp_vector4_t values in a form
that does not duplicate the width of all the items. This can save a lot
of space when big memories are simulated.
The vvp_net_fun_t objects, and derived objects, are small, and are
created in large quantities. Tightly pack them into permanently
allocated space in order to save on system allocation overhead, and
thus save overall on memory.
The vvp_net_t objects are never deleted, so overload the new operator
to do a more space efficient permanent allocation.
The %assign/v instruction copied the vvp_vector4_t object needlessly
on its way to the scheduler. Eliminate that duplication.(cherry picked from commit d0f303463d)
The vvp_vector8_t constructor and destructor involve memory allocation
so it is best to pass these objects by reference as much as possible.
Also have the islands take more care not to perform resolution if the
inputs aren't really different.
NOTE: This is a port of commit 2f4e5bf5b6
from the "performance" branch, without the resolver scheduling changes.
This was causing test suite variances with pr1820472.v. It looks like
there might be a race in that program anyhow, but for now leave out the
resolver scheduling changes so that the rest of this commit can go in.
Fold the bi-directional part select into the pass switch (tran) support
so that it can be really bi-directional. This involves adding a new
tranvp device that does part select in tran islands, and reworking the
tran island resolution to handle non-identical nodes. This will be needed
for resistive tran devices anyhow.
The vvp_island classes are added, as well as support for tranif nodes
that use this concept. The result is a working implementation for
tranif0 and tranif1.
In the process, the symbol table functions were cleaned up and made
into templates for better type safety, and the vvp_net_ptr_t was
generalized so that it can be used by the branches in the island
implementation.
Also fix up the array handling to use the better symbol table support,
and to remember to clear its own table when linking is done.
The AND and OR operators for vvp_bit4_t are slightly tweaked to be
lighter and inlinable.
The vvp_vector4_t::set_bit is optimized to do less silly mask fiddling.
When processing wide vectors of these operations, it pays to process
them as vectors. This improves run-time performance. Have the run time
select vectorized or not based on the vector width.
These instructions can take advantage of the much optimized
vector_to_array function to do their arithmetic work quickly and
punt on X very quickly if needed. This helps some benchmarks.
Functions like $monitor need to attach callbacks to array words if
those words are to be monitored. Have the array hold all the callbacks
for words in the array, under the assumption that the monitored words
are sparse.
The vvp_vector4_t often receives the results of vector arithmetic.
Add an optimized method for setting that data into the vector. Take
into account that arithmetic results have no X/Z bits, etc.
By slightly altering the vvp_bit4_t encoding, a few simple
optimizations become possible. By making Z==2 and X==3, the
conversion from X/Z to X is a simple shift-or, and this can
be used to reduce the size of some of the bit4 operators.
The vvp_vector4_t holds 4-value logic. This patch changes the encoding
of 4-value bits in the vector to use separate A- and B bit vectors,
with the B- vector signaling the A- bits that are not 0/1. This
allows rapid conversion to 2-value logic, and rapid tests for X
and Z values.
This patch adds the ability to assign/deassign a bit or part select.
It also cleans up the code and fixes some problem in the forcing of
strength aware nets.
This patch adds functionality to do a bit or part select release
when a constant value is forced to the net/register. It also adds an
error message when the user tries to force a signal to a bit/part
select. This is not currently handled by the run time, so is now
caught in the compiler (tgt-vvp). Where when this functionality is
needed, it will be easy to know what to do instead of trying to track
down some odd runtime functionality.
What this all means is that you can force a signal to an entire
signal or you can force a constant to any part of a signal (bit,
part or entire) and release any of the above. Technically the
release of a constant value does not have to match the force.
The runtime verifies that if you are releasing a signal driver
it is being done as a full release. I don't see an easy way to
check this in the compiler.
To fix the signal deficiencies we need to rework the force_link
code to allow multiple drivers and partial unlinking. Much of
this is in the runtime, but the %force/link operator may also
need to be changed like I did to the %release opcode.
This patch adds the power operator for signed bit based values
in a continuous assignment. It also fixes a few other power
expression width problems. The expression width is still not
calculated correctly, since the correct method can produce huge
possible bit widths. The result is currently limited to the width
of the native long. This is because lround() is used to convert
from a double to an integer. A check in the code generator protects
the runtime from this limitation.
This patch adds the power operator for unsigned bit based values
in a continuous assignment. It also refactors the power code for
normal expressions and continuous assignments.
This patch adds bit based power support to normal expressions.
It also pushes the constant unsigned bit based calculation to
the runtime until the bit based method can be copied to the
compiler. Continuous assignments also need to use this type
of calculation.
Allow user defined functions to take real value arguments and return
real value results in net contexts. Use the data type of the nets
attached to the ports to define the data types of the arguments and
return value.
This patch adds a new opcode %load/avp0 that is used to load a
word from an array and add a value to it. %load/vp0 was
changed/fixed to do the summation at the result width not the
vector width. This allows small vectors to index large arrays with
an offset. A few errors in the opcodes.txt file were also fixed.
Where and expression is an immediate value added to a signal value,
it is possible to optimize them to a single instruction that combines
the load with an add at the same time.
Wide division/modulus (more bits than unsigned long) gave incorrect
results when both the divisor and dividend where the same. They also
did not produce an error message when dividing by zero.
more general concept of arrays. The NetMemory and NetEMemory
classes are removed from the ivl core program, and the IVL_LPM_RAM
lpm type is removed from the ivl_target API.