This patch changes all the iterator code to use a prefix ++ instead
of postfix since it is more efficient (no need for a temporary). It
is likely that the compiler could optimize this away, but lets make
it efficient from the start.
This patch adds -Wextra to the compilation flags for C++ files in
the vvp and vpi subdirectories. It also fixes all the problems
found while adding -Wextra. This mostly entailed removing some of
the unused arguments, removing the name for others and using the
correct number of initializers.
This patch adds a few missing initializations to various constructors
in the vvp directory. It also enhances the array alias code to copy
more values from the aliased array.
The functions (malloc, free, etc.) that used to be provided in
malloc.h are now provided in cstdlib for C++ files and stdlib.h for
C files. Since we require a C99 compliant compiler it makes sense
that malloc.h is no longer needed.
This patch also modifies all the C++ files to use the <c...>
version of the standard C header files (e.g. <cstdlib> vs
<stdlib.h>). Some of the files used the C++ version and others did
not. There are still a few other header changes that could be done,
but this takes care of much of it.
Some new shadow issues have crept in. This patch fixes these new
issues and adds -Wshadow to the normal warning flags to keep any
new occurrences from happening.
Tran islands must do their calculations using the forced values,
if any. But the output from a port must also be subject to force
filtering. It's a little ugly, but hopefully won't hurt the more
normal case.
It is acceptable to call delete on a NULL object. It's also acceptable
to assign the value to zero if it is already zero. Removing the
superfluous if should produce slightly better code since there is no
conditional to deal with except for what is likely in the delete
implementation, but that should be highly optimized.
This patch cleans up some style issues: no need to check that a value
is defined before freeing or deleting it, use C++ style casts, make
sure to NULL terminate strncpy(), empty() is faster than size() for
size == 0 or size >= 0 checks, re-scope some variables, etc.
The process of inverting and copying can be collapsed into a single
operation that should run a little faster. Also, inverting readily
vectorizes itself. I've also possibly reduced useless vvp_not_fun
iterations.
Also, include some minor tweaks to the vvp_vector8_t handling of
vector copies. This is more to clean up code, although it should
slightly improve performance as well.
The vvp thread word storage had previously been changed to always store
64-bit values, but some instructions still only operate on native long
values. This patch ensures all instructions that modify thread words
support 64-bit values.
The Cygwin compiler is a bit picky. This patch adds some casts
to remove compilation warnings. In the past I have had warnings
off because of problems with the STL, but for this directory we
may as well squash as many warings as we can. It also does not
recognize that an assert(0) or assert(false) ends a routine so
it complains about no return at end of function or variables
not being defined.
If the source and destination of a subvector to be moved in the
vvp_vector4_t::mov method is nicely word aligned, and the transfer
size is a full word, then we ar much better off handling that as
a special case. This makes the move faster, and also avoids some
shift overflow errors.
For the %mov instruction, implement a vvp_vector4_t::mov method to
manipulate the thread vector directly.
For the %load/v instruction, rework the vec4_value() methods to
avoid creating vvp_vector4_t temporaries, and therefore reduce the
copy overhead.
This patch adds code to cleanup system functions driving a
continuous assignment. It also modifies the user function
cleanup to not interfere with this. It also adds a count
of the nets and signals that were not cleaned up that is
pnly printed when running valgrind. They are not flagged
y valgrind since they are pool managed objects. There are
a few signals that need to be cleaned up and local nets
are missed so there are a lot of nets.
filters need to be able to cope with parts of vectors moving through
the net. It makes the most sense to handle every filter as a part-
selected filter.
Handle the case where the input to a net is a constant. Since nets
do not exist anymore as nodes in their own right, we need to create
a driver to drive a net from a constant.
Handle forward references of net inputs.
This patch fixes a number of bugs related to real variable and net
arrays. Specifically the following:
1. When iterating over (scanning) a net array start at base index 0
not index 1.
2. Don't fail when iterating over (scanning) a real variable array.
3. Run the array_word_change() routine when a real variable array
word is changed. This allows array ports and value change
callbacks to work correctly.
4. Update the array_word_change() routine to work with real variable
arrays.
5. Update the array port code to support real variable arrays.
6. find_name() needs to also iterate over net array words just like
memory array words.
7. Initialize all real array words to 0.0 when the array is created.
Someone got a bit too creative in reducing the original equations
I wrote to handle this. This patch reverts the previous code and
uses my original equations. This passes for both wide and narrow
vectors. The equations are slightly more complicated, but the old
z2x conversion had some overhead. I would expect the time to be
about the same, but you now get the correct results.
This is moving towards moving force/release out of the signal
class. The end-game is to remove all of the wire implementation
out of the functor and into the filter. Variables will remain in
the functor.
Move the vvp_vpi_callback to the vvp_net_sig.h header file, and
collapse some useless hierarchy. (Specifically, all callbackable
items are also wordable.)
Move the run_vpi_callback invocation for wires/variables from the
output generator to the newly implemented filter object. This is
starting to get the filter class working.
the vvp_net.h header file is getting pretty huge. This divides
the obviously separable signal functor code out into its own
header and source files.
Also, fill out the use of the filter member of the vvp_net_t
object. Test the output of the vvp_net_t against the filter.
The out pointer of a vvp_net_t object is going to be a bit more
sophisticated when we redo the handling of net signals. Take a step
towards this rework by making the pointer private and implementing
methods needed to access it.
This patch adds code to free most of the memory when vvp
finishes. It also adds valgrind hooks to manage the various
memory pools. The functionality is enabled by passing
--with-valgrind to configure. It requires that the
valgrind/memcheck.h header from a recent version of
valgrind be available. It check for the existence of this
file, but not that it is new enough (version 3.1.3 is known
to not work and version 3.4.0 is known to work).
You can still use valgrind when this option is not given,
but you will have memory that is not released and the
memory pools show as a single block.
With this vvp is 100% clean for many of the tests in the
test suite. There are still a few things that need to be
cleaned up, but it should be much easier to find any real
leaks now.
Enabling this causes a negligible increase in run time and
memory. The memory could be a problem for very large
simulations. The increase in run time is only noticeable on
very short simulations where it should not matter.
This patch fixes a number of problems related to the divide and
modulus operators.
The net version (CA) of modulus did not support a signed version.
Division or modulus of a value wider than the machine word did
not correctly check for division by zero and return 'bx.
Fixed a problem in procedural modulus. The sign of the result is
only dependent on the L-value.
Division or modulus of a signed value that was the same width as
the machine word was creating an incorrect sign mask.
Division of a signed value that would fit into a single machine
word was not checking for division by zero.
Division or modulus of a wide value was always being done as
unsigned.
Added a negative operator for vvp_vector2_t. This made
implementing the signed wide division and modulus easier.
Support arrays of realtime variable arrays and net arrays. This
involved a simple fix to the ivl core parser, proper support in
the code generator, and rework the runtime support in vvp.
This patch splits any VVP net functor that needs to access both
statically and automatically allocated state into two sub-classes,
one for handling operations on statically allocated state, the
other for handling operations on automatically allocated state.
This undoes the increase in run-time memory use introduced when
automatic task/function support was first introduced.
This patch also fixes various issues with event handling in automatic
scopes. Event expressions in automatic scopes may now reference either
statically or automatically allocated variables or arrays, or part
selects or word selects thereof. More complex expressions (e.g.
containing arithmetic or logical operators, function calls, etc.) are
not currently supported.
This patch introduces some error checking for language constructs
that may not reference automatically allocated variables. Further
error checking will follow in a subsequent patch.
Start cleaning up shadowed variables, flagged by turning on -Wshadow.
No intended change in functionality. Patch looks right, and is tested
to compile and run on my machine. YMMV.
alloc_instance for real values was just passing a new double
to be added to the context items. The new constructor does
not set the default value so we need to do that manually.
The previous patch (commit 8b0ca902a6)
dealing with the possibilities of (unsigned long) and (vvp_time64_t)
being either the same or different managed to redefine UL_AND_TIME64_DIFF
in the 64-bit case. This does, of course, trigger a compiler warning.
That warning is repeated on every .cc file with a #include "config.h",
which is to say, just about every file.
This patch inverts the sense of the preprocessor conditional, calling
it UL_AND_TIME64_SAME. No more warnings!
A variable that is used to set the delay of a .delay statement
must be scaled to match the local units and for real values
rounded using the precision. This value is then converted to
the simulation precision.
The right shift of vvp_vector2_t needs to
account for and mask off shifted bits. Otherwise
there will be unexpected results after
a vvp_vector2_t::trim method.
Assume that anything that is strength aware already handles a
recv_vec8_pv and make the default function convert the bits
to a vec4 and then call recv_vec4_pv with this new value.
The double to vvp_vector4_t constructor was not using the correct
declaration for the bit words. This worked as long as unsigned and
unsigned long were the same size (usually).
The new real to int conversion was incorrectly setting the
bits for minus infinity to all ones. This is incorrect in a
two's complement encoding where the largest negative number
would be a leading 1 followed by an infinite number of zeros.
This patch makes .part/pv strength aware, resolv vec8_pv
aware. vvp_net_fun_t adds vec8_pv as a virtual function
with an appropriate error default. vvp_fun_signal should
full support vec8_pv (not tested and may not be needed).
This patch adds .cast/int and updates .cast/real to act as a local
(temporary) net and to support either a signed or unsigned input.
The vvp_vector4_t class not can convert an arbitrarily sized double
to a vector value. This removes the restriction of lround().
Also document the new statements.
First, handle the trivial (but possibly common) resolution cases in
inlined code, and only call the complete function for the complicated
cases. Then clean up the complex function for readability, and account
for the constraints that the front-end function established.
Arrays of vvp_vector4_t values redundantly store some fields in every
word. Create a special type that stores vvp_vector4_t values in a form
that does not duplicate the width of all the items. This can save a lot
of space when big memories are simulated.
The schedule_assign_plucked_vector is a better way to implement the
schedule_assign_vector, or at least no worse, so remove the now
redundent schedule_assign_vector.
The vvp_net_fun_t objects, and derived objects, are small, and are
created in large quantities. Tightly pack them into permanently
allocated space in order to save on system allocation overhead, and
thus save overall on memory.
The vvp_net_t objects are never deleted, so overload the new operator
to do a more space efficient permanent allocation.
The %assign/v instruction copied the vvp_vector4_t object needlessly
on its way to the scheduler. Eliminate that duplication.(cherry picked from commit d0f303463d)
The vvp_vector8_t constructor and destructor involve memory allocation
so it is best to pass these objects by reference as much as possible.
Also have the islands take more care not to perform resolution if the
inputs aren't really different.
NOTE: This is a port of commit 2f4e5bf5b6
from the "performance" branch, without the resolver scheduling changes.
This was causing test suite variances with pr1820472.v. It looks like
there might be a race in that program anyhow, but for now leave out the
resolver scheduling changes so that the rest of this commit can go in.
The %load/v instruction was doing some spurious resizes of the vector
that comes from the signal. Eliminate those resizes that can be
removed, and optimize some that remain.
The AND and OR operators for vvp_bit4_t are slightly tweaked to be
lighter and inlinable.
The vvp_vector4_t::set_bit is optimized to do less silly mask fiddling.
When processing wide vectors of these operations, it pays to process
them as vectors. This improves run-time performance. Have the run time
select vectorized or not based on the vector width.
Improve vvp_vector4_t methods copy_bits and the part selecting constructor
to make better use of vector words. Eliminate bit-by-bit processing by
these methods to take advantage of host processor words.
Improve vthread_bits_to_vector to use these improved methods and Update
the %load/av and %set/v instructions to take advantage of these changes.
These instructions can take advantage of the much optimized
vector_to_array function to do their arithmetic work quickly and
punt on X very quickly if needed. This helps some benchmarks.
The vvp_vector4_t often receives the results of vector arithmetic.
Add an optimized method for setting that data into the vector. Take
into account that arithmetic results have no X/Z bits, etc.
On some systems, 1UL<<X will make a mess if X is the size of
an unsigned long. This especially seems to be a problem on i386
systems. Protect those shifts in the vvp_net.cc.
By slightly altering the vvp_bit4_t encoding, a few simple
optimizations become possible. By making Z==2 and X==3, the
conversion from X/Z to X is a simple shift-or, and this can
be used to reduce the size of some of the bit4 operators.
The vvp_vector4_t holds 4-value logic. This patch changes the encoding
of 4-value bits in the vector to use separate A- and B bit vectors,
with the B- vector signaling the A- bits that are not 0/1. This
allows rapid conversion to 2-value logic, and rapid tests for X
and Z values.
This patch adds the ability to assign/deassign a bit or part select.
It also cleans up the code and fixes some problem in the forcing of
strength aware nets.
This patch adds functionality to do a bit or part select release
when a constant value is forced to the net/register. It also adds an
error message when the user tries to force a signal to a bit/part
select. This is not currently handled by the run time, so is now
caught in the compiler (tgt-vvp). Where when this functionality is
needed, it will be easy to know what to do instead of trying to track
down some odd runtime functionality.
What this all means is that you can force a signal to an entire
signal or you can force a constant to any part of a signal (bit,
part or entire) and release any of the above. Technically the
release of a constant value does not have to match the force.
The runtime verifies that if you are releasing a signal driver
it is being done as a full release. I don't see an easy way to
check this in the compiler.
To fix the signal deficiencies we need to rework the force_link
code to allow multiple drivers and partial unlinking. Much of
this is in the runtime, but the %force/link operator may also
need to be changed like I did to the %release opcode.
This patch fixes two problems. The first is that thr_check_addr()
was being used inconsistently. It should be passed a real address,
but the resize of the vector should be at least one more than this
address. The extra and unneeded CPU_WORD_BITS was also removed
from the routine.
The second problem involved an invalid memory access in
vvp_vector4_t::set_vec() when the vector being copied was an integer
multiple of the machine word width. Under this condition there would
be no remaining bits that needed to be copied but the routine was always
trying to copying some remaining bits. This code is now only executed
when there is a remainder.
Neither of these appear to be causing runtime problems. The second one
was found with valgrind. The first were found while tracking down the
second problem.
This patch adds the power operator for signed bit based values
in a continuous assignment. It also fixes a few other power
expression width problems. The expression width is still not
calculated correctly, since the correct method can produce huge
possible bit widths. The result is currently limited to the width
of the native long. This is because lround() is used to convert
from a double to an integer. A check in the code generator protects
the runtime from this limitation.
This patch adds the power operator for unsigned bit based values
in a continuous assignment. It also refactors the power code for
normal expressions and continuous assignments.
This patch adds bit based power support to normal expressions.
It also pushes the constant unsigned bit based calculation to
the runtime until the bit based method can be copied to the
compiler. Continuous assignments also need to use this type
of calculation.
Allow user defined functions to take real value arguments and return
real value results in net contexts. Use the data type of the nets
attached to the ports to define the data types of the arguments and
return value.
This patch adds a new opcode %load/avp0 that is used to load a
word from an array and add a value to it. %load/vp0 was
changed/fixed to do the summation at the result width not the
vector width. This allows small vectors to index large arrays with
an offset. A few errors in the opcodes.txt file were also fixed.