The part select of a vector is converted by the compiler during
elaboration to a 0-based canonical address. But since it is legal
to address bits below the LSB, the canonical address can be negative.
So make the part select base for selecting from signals work with
signed arithmetic and make the code generator generate negative
indices when needed.
Scheduler cells are small objects that come and go in great quantities.
Even though they are allocated and deallocated a lot, they tend to a
steady state quantity, so put together a heap that is unique for each
cell type.
This heap actually saves memory overall because cells are allocated in
chunks, thus eliminating allocator overhead, and they are pulled/pushed
from/to a heap very quickly so that what overhead remains is slight and
bounded.
The vvp_net_t objects are never deleted, so overload the new operator
to do a more space efficient permanent allocation.
The %assign/v instruction copied the vvp_vector4_t object needlessly
on its way to the scheduler. Eliminate that duplication.(cherry picked from commit d0f303463d)
The vvp_vector8_t constructor and destructor involve memory allocation
so it is best to pass these objects by reference as much as possible.
Also have the islands take more care not to perform resolution if the
inputs aren't really different.
NOTE: This is a port of commit 2f4e5bf5b6
from the "performance" branch, without the resolver scheduling changes.
This was causing test suite variances with pr1820472.v. It looks like
there might be a race in that program anyhow, but for now leave out the
resolver scheduling changes so that the rest of this commit can go in.
The __vpiPV objects express themselves as vpiPartSelect objects.
Add support for value change callbacks by attaching the callback
to the signal that we part select from.
The %load/v instruction was doing some spurious resizes of the vector
that comes from the signal. Eliminate those resizes that can be
removed, and optimize some that remain.
This is not a solution to all the problems, but is a better catch-all
then what is currently there. Allow the index field to be a T<> that
accesses the thread to get the address index.
Note that the lexor.lex currently returns the T<> as a T_SYMBOL, and the
users of T_SYMBOL objects need to interpret the meaning. This is
probably not the best idea, in light of all the other *<> formats that
now exist.
If a memory word was accessed before it was defined the
code was returning a zero width vector result. Now it
returns an appropriately sized vector of 'x'.
The memory opcodes %assign/mv, %load/mv and %set/mv
were removed by a previous patch. This one removes
the documentation from opcodes.txt. It also removes
the documentation for the .mem* statements for the
same reason.
Recursive branch resolution was scanning every branch end, even though
many branch ends share ports and need not be repeatedly scanned. Handle
marks and flags to cut off recursion where it is not needed so as to
save much run time.
Conflicts:
tgt-vvp/vvp_scope.c
Note that the draw_net_input.c takes in a lot of the codes that used
to be in vvp_scope.c, so some changes may have been lost.
This patch cleans up the dump routines and adds file and
line number information for errors. It also adds some of
the missing MemoryWord properties so they can now be
dumped and monitored correctly.
This patch adds $simparam and $simparam$str from Verilog-A.
The analog simulator parameters return 0.0 or N/A. The
vvp_cpu_wordsize system function has been moved into the
$simparam call and is now named CPUWordSize.
This patch also starts the factoring of common code in the
vpi directory. Some routines were renamed.
The priv.c file was renamed to sys_priv.c to match the
include file.
System functions can now have strings put to their output.
Fold the bi-directional part select into the pass switch (tran) support
so that it can be really bi-directional. This involves adding a new
tranvp device that does part select in tran islands, and reworking the
tran island resolution to handle non-identical nodes. This will be needed
for resistive tran devices anyhow.
The vvp_island classes are added, as well as support for tranif nodes
that use this concept. The result is a working implementation for
tranif0 and tranif1.
In the process, the symbol table functions were cleaned up and made
into templates for better type safety, and the vvp_net_ptr_t was
generalized so that it can be used by the branches in the island
implementation.
Also fix up the array handling to use the better symbol table support,
and to remember to clear its own table when linking is done.
The multiply runs does not need to do all the combinations of digit
products, because the higher ones cannot add into the result. Fix the
iteration to limit the scan.
Clarify that operands are typically 32bits, and have the code generator
make better use of this.
Also improve the %movi implementation to work well with marger vectors.
Add the %andi instruction to use immediate operands.
Use high radix long division to take advantage of the divide hardware
of the host computer. It looks brute force at first glance, but since
it is using the optimized arithmetic of the host processor, it is much
faster then implementing "fast" algorithms the hard way.
This instruction adds an integer value to the value being loaded. This
optimization uses subarrays instead of the += operator. This is faster
because the value is best loaded into the vector as a subarray anyhow.
The AND and OR operators for vvp_bit4_t are slightly tweaked to be
lighter and inlinable.
The vvp_vector4_t::set_bit is optimized to do less silly mask fiddling.
When processing wide vectors of these operations, it pays to process
them as vectors. This improves run-time performance. Have the run time
select vectorized or not based on the vector width.
Improve vvp_vector4_t methods copy_bits and the part selecting constructor
to make better use of vector words. Eliminate bit-by-bit processing by
these methods to take advantage of host processor words.
Improve vthread_bits_to_vector to use these improved methods and Update
the %load/av and %set/v instructions to take advantage of these changes.
The MinGW system() implementation appears to return the straight
return value instead of the waitpid() like result that more
normal systems return. Because of this just return the system()
result without processing for MinGW compilations.
Older version of the MinGW runtime (pre 3.14) just used the
underlying vsnprintf(). Which has some problems. The 3.14 version
has some nice improvements, but it has a sever bug when processing
"%*.*f", -1, -1, <some_real_value>. Because of this we need to use
the underlying version without the enhancements for now.
snprintf prints %p differently than the other printf routines
so use _snprintf to get consistent results.
Only build the PDF files if both man and ps2pdf exist.
MinGW does not know about the z modifier for %d, %u, etc.
Add some missing Makefile check targets.
These instructions can take advantage of the much optimized
vector_to_array function to do their arithmetic work quickly and
punt on X very quickly if needed. This helps some benchmarks.
Functions like $monitor need to attach callbacks to array words if
those words are to be monitored. Have the array hold all the callbacks
for words in the array, under the assumption that the monitored words
are sparse.
Array words don't have a vpiHandle with a label, so the %vpi_call
needs a special syntac for arguments that reference array words.
This syntax creates an array word reference that persists and can
be used at a VPI object by system tasks.
It is possible for the code generator to create .array/port objects
before the .array object that the port refereces, so use the resolv_list
to arrange for binding during cleanup.
Save tons of space per memory word by not creating a vpi handle for
each and every word of a variable array. (Net arrays still get a
vpiHandle for every word.) The consequence of this is that all
accesses to a variable array need to go through the indexing.
This commit handles the most common places where this impacts, but
there are still problems.
vpi_handle_by_name() was assuming it was always given a valid scope
object. In the context of vpi_chk_error() this is not required and
some users use/abuse the interface by calling the function with invalid
objects expecting a 0 return value. This patch adds an explicit check
for the supported types vpiScope and as an extension vpiModule.
Anything else should be flagged as an error once we have vpi_chk_error()
implemented, but for now it just returns 0.
The abs() function needs to be able to turn -0.0 into 0.0. This proved
to be too clunky (and perhaps impossible) to do with tests and jumps,
so add an %abs/wr opcode to do it using fabs().
The min/max functions need to take special care with the handling
of NaN operands. These matter, so generate the extra code to handle
them.
This patch adds file and line information for parameters and
local parameters. It also adds file/line stubs for signals in
the tgt-* files. It adds the pform code needed to eventually
do genvar checks and passing of genvar file/line information.
It verifies that a genvar does not have the same name as a
parameter/local parameter.
This patch makes sure that objects either support vpiFile
and vpiLineNo or adds dummy code so that a runtime error
will not occur when accessing these properties. It also
returns 1 for the size of real variables and adds a
simplified vpiIndex that matches the Memory interface.
This patch adds code to push the file and line information
for scope objects (modules, functions, tasks, etc.) to the
runtime. For modules, this includes the definition fields.
This patch adds ifnone functionality. It does not produce an
error when both an ifnone and an unconditional simple module
path are given. For this case the ifnone delays are ignored.
Fix handling of cases where multiple specified delays are activated
for a given output. Need to apply the standard selection criteria
that gets the minimum value.
The %sub instruction didn't have the efficent implementation that
the %add instructions used. Update subtraction to use the array
method, so that it gets the same performance benefits.
The vvp_vector4_t often receives the results of vector arithmetic.
Add an optimized method for setting that data into the vector. Take
into account that arithmetic results have no X/Z bits, etc.
On some systems, 1UL<<X will make a mess if X is the size of
an unsigned long. This especially seems to be a problem on i386
systems. Protect those shifts in the vvp_net.cc.
By slightly altering the vvp_bit4_t encoding, a few simple
optimizations become possible. By making Z==2 and X==3, the
conversion from X/Z to X is a simple shift-or, and this can
be used to reduce the size of some of the bit4 operators.
In previous incarnations of the vvp runtime, bit vectors were passed
around as arrays of unsigned char that charried bit4 vectors. That
is no longer used. Remove the last vestiges of that dead code.
Remove dependencies on vvp_bit4_encoding outside of the vvp_net
core types. The table_functor_s class was the worst offfender and
was barely used, so it is now removed completely. There are a few
opcodes in vhtread.cc that also make vvvp_bit4_t encoding
assumptions (and used casts) and those have been fixed. There
were also various VPI interface functions that are fixed.
The vvp_vector4_t holds 4-value logic. This patch changes the encoding
of 4-value bits in the vector to use separate A- and B bit vectors,
with the B- vector signaling the A- bits that are not 0/1. This
allows rapid conversion to 2-value logic, and rapid tests for X
and Z values.
This patch fixes deassign to allow it to unlink from a driver.
It also zeros the cassign_link and force_link pointers after
they have been unlinked. Not doing this will cause an assert
if deassign/release are called multiple times (variable only).
This patch adds the ability to assign/deassign a bit or part select.
It also cleans up the code and fixes some problem in the forcing of
strength aware nets.
Rework the MUXZ and MUXR code to use an enum instead of plain
integers for the select input state. This makes it more obvious
what is actually going on.
This patch adds a flag to the MUXZ object to make sure that it will
run at time zero if needed. If this is not done the default Z result
may not be overridden by an X result.
This patch adds a %assign/av/d opcode. This is a version of %assign/av
that allows a delay expression. Ultimately this allows a dynamically
indexed array to have a delay expression (non-constant delay value).
This patch add (I believe back in) the < and > characters to the
general T_SYMBOL regular expression. The match was also moved after
the special T_SYMBOL cases so that flex would match them correctly.
Flex matching rules are longest or if a tie the first.
vvp did not have the ability to handle real parameters.
This patch fixes that omission. Parameters are only used
by vpi calls to get compile time information.
Threads used to be deleted when they finished processing code.
The problem with this is that some of the code could be
rescheduled to run at rosync ($strobe, etc.). This allowed the
thread data the code depended on to be reaped too soon. This
patch uses a new queue to schedule thread deletion. The queue
is processed after rosync has finished.
This patch makes the muxz and muxr functors schedule events
instead of directly calling vvp_send_*(). This prevents the
code from going into an infinite loop when the output feeds
back to the select.
This patch adds functionality to do a bit or part select release
when a constant value is forced to the net/register. It also adds an
error message when the user tries to force a signal to a bit/part
select. This is not currently handled by the run time, so is now
caught in the compiler (tgt-vvp). Where when this functionality is
needed, it will be easy to know what to do instead of trying to track
down some odd runtime functionality.
What this all means is that you can force a signal to an entire
signal or you can force a constant to any part of a signal (bit,
part or entire) and release any of the above. Technically the
release of a constant value does not have to match the force.
The runtime verifies that if you are releasing a signal driver
it is being done as a full release. I don't see an easy way to
check this in the compiler.
To fix the signal deficiencies we need to rework the force_link
code to allow multiple drivers and partial unlinking. Much of
this is in the runtime, but the %force/link operator may also
need to be changed like I did to the %release opcode.
vpi_put_value can mimic force and release with vpiForceFlag and
vpiReleaseFlag flags to the vpi_put_value call. With this patch,
the infrastructure is added to allow the flags argument to be passed
to the dispatched put_value function, and for signals handle those
flags as force/release of a net.
This patch reworks much of the ternary code to short circuit when
possible and supports real values better. It adds a blend operator
for real values that returns 0.0 when the values differ and the value
when they match. This deviates slightly from the standard which
specifies that the value for reals is always 0.0 when the conditional
is 'bx. There are also a couple bug fixes.
These fixes have not been ported to continuous assignments yet.
Ternary operators used at compile time and in procedural assignments
should be complete (short circuit and support real values).
draw_number_bool64() in tgt-vvp/eval_bool.c was using %ix/load to
load immediate values into registers greater than three. The problem
was that of_IX_LOAD() in vvp/vthread.cc was masking off the upper
bits. This was putting the results in the wrong register. This patch
removes the bit masking from of_IX_LOAD() and updates the %ix/load
documentation.
This patch fixes two problems. The first is that thr_check_addr()
was being used inconsistently. It should be passed a real address,
but the resize of the vector should be at least one more than this
address. The extra and unneeded CPU_WORD_BITS was also removed
from the routine.
The second problem involved an invalid memory access in
vvp_vector4_t::set_vec() when the vector being copied was an integer
multiple of the machine word width. Under this condition there would
be no remaining bits that needed to be copied but the routine was always
trying to copying some remaining bits. This code is now only executed
when there is a remainder.
Neither of these appear to be causing runtime problems. The second one
was found with valgrind. The first were found while tracking down the
second problem.
This patch adds the power operator for signed bit based values
in a continuous assignment. It also fixes a few other power
expression width problems. The expression width is still not
calculated correctly, since the correct method can produce huge
possible bit widths. The result is currently limited to the width
of the native long. This is because lround() is used to convert
from a double to an integer. A check in the code generator protects
the runtime from this limitation.
This patch adds the power operator for unsigned bit based values
in a continuous assignment. It also refactors the power code for
normal expressions and continuous assignments.
This patch adds bit based power support to normal expressions.
It also pushes the constant unsigned bit based calculation to
the runtime until the bit based method can be copied to the
compiler. Continuous assignments also need to use this type
of calculation.
vvp/main.cc was including asm/page.h on GNU/Linux
systems, though that file does not often exist and
is not necessary.
Signed-off-by: Michael Witten <mfwitten@mit.edu>
This patch makes sure the delay is calculated correctly when only some
of the bits change and you are comparing against the initial (t0) value.
Basically you have to check the initial value against all the bits in
the new signal not just the first bit since the order that bits change
is not deterministic.
Allow user defined functions to take real value arguments and return
real value results in net contexts. Use the data type of the nets
attached to the ports to define the data types of the arguments and
return value.
When conditional delays are in use, it is sometimes possible for there
to be no delays available for a given input event. In that case, skip
the delay processing for that case instead of crashing.
Rework the encoding of a real value in the Cr<> label to be similar to
the format used by the %loadi/wr instruction. This mantissa-expoment
format better carries all the bits of the desired real value with
plenty of fidelity and range.
This patch adds a new opcode %load/avp0 that is used to load a
word from an array and add a value to it. %load/vp0 was
changed/fixed to do the summation at the result width not the
vector width. This allows small vectors to index large arrays with
an offset. A few errors in the opcodes.txt file were also fixed.
This patch schedules the input value change to .sfuns calls to avoid
potential recursion problems. It also cleans up two put_value calls
that did unneeded loops when putting VectorVals and the width was
greaten than 32 bits.
This patch adds a new flag to vvp "-n" that can be used to make
$stop and hence <Control-C> act like $finish. This may be desired
when using vvp in a non-interactive environment.
tgt-vvp/eval_expr.c uninitialized variables
vpi/sys_display.c uninitialized variables
vvp/vpi_priv.cc deprecated string constant usage
vvp/vpi_vthr_vector.cc deprecated string constant usage
the last entry invokes vpip_name_string() and uses const char *
in the same way as the other 9 callers in vvp/*.cc, the only
difference is that the argument is static instead of computed.
Rework the ivl_file_table_* interface to be more generic and easier
to use. Also all the vvp examples except for memory.vvp have been
fixed to run correctly with the current vvp. Someone with a bit more
experience will need to fix memory.vvp.
Add the vpiFile and vpiLineNo properties to system functions.
Most other objects have stubs that return "N/A"/0. Interactive
functions (called from the debugger) use <interactive> for the
file name.
With this change, local symbols are not emitted in the vvp target,
but are marked as local. When thus marked, the vvp run time does not
offfer any VPI access and the signals (net or var) are effectively
invisible.
The recent changes to vpi_get_str had a minor problem that was
causing a core dump when trying to get the vpiFullName of an
arrayed signal. Basically vpi_get_str is not reentrant so the
name must be duplicated. It must also be duplicated to work with
the free in the code (do not free a result_buf).
Gets rid of a few warning: deprecated conversion from string
constant to 'char*', follows IEEE 1364-2001C 27.10 in more cases,
and fixes at least one real bug (look at the previous use of
strdup/strcat in real_var_get_str() and signal_get_str()).
This patch fixes a number of problems associated with the various dumpers.
1. It catches the problem uncovered in pr1809904 and prevents the core
dumps from happening. It prints a warning message if you try to change
the file after the program has started dumping data to the file.
2. It makes all the dumpers work in the same manner. Before the VCD
$dumpfile was processed at compile time, but the LXT/LXT2 versions
did this at run time. The correct place for this is the run time so
that you can pass the task a calculated name.
3. All the dumpers use common compiletf routines located in vcd_priv.c
4. Make the LXT/LXT2 $dumpfile commands only get the file name and
let $dumpvars actually open the file. This matches the VCD code.
5. Fix the $dumpfile code to allow calculated file names.
6. Make dumpvars without a scope/variable match all toplevel modules
not just the toplevel module the $dumpvars was located in. This
now matches the standard (2001).
7. Simplify the no dumper code (vcdoff.c) and add missing functions.
8. Cleanup the code and messages.
9. vvp can take -none for no dumper.
This patch fixes another minor problem introduced by the process
end of simulation events. Specifically if the compilation has
indicated we should not run do not even start the main event loop.
Parse SDF file annotations of edge sensitive delay paths.
Add vpi support for getting the specified edge sensitivity of
an edge sensitive path, and annotate paths with proper attention
to the edge that is specified for the path.
When loading a part select from the least significant bits, it is
OK to use the %load/v instruction to strip the high bits off. This
allows the zero-based part select to work in one step, without
loading excess bits.
The %ix/getv instruction loads an integer register directly from
a signal vector. This is an optimization that an index register
is loaded from an expression is only a signal. It avoids the thread
vector space.
Use the new array VPI functionality (vpiIndex, vpiParent, vpiArray)
to dynamically generate array word names. The old patch to implement
was mostly reverted.
Add the array related VPI calls. These will be used to generate
the array word name only as needed to conserve space. This patch
also makes scanmem3 from the vpi test work correctly after a
slight gold file update.
Where and expression is an immediate value added to a signal value,
it is possible to optimize them to a single instruction that combines
the load with an add at the same time.
The functor counters were left over from the v0.8 release. Rework
the counters to be relevent to the current state of vvp.
Signed-off-by: Stephen Williams <steve@icarus.com>
This patch adds the ability to dump array words. The words are only
dumped if they appear in a $dumpvars() statement (they are not dumped
by default). The name used for the word is <array_name>[<index>], so
you can get unexpected name conflicts.
There is also a slight increase in the memory requirements since each
array word now keeps its own name information. In the future we would
like to change this, but that is a much larger rewrite of the array
code in vvp.
This patch also needs the "Prefix escaped identifier ..." patch to
work correctly (the array word name is an escaped identifier).
Implement extended vvp command line options to control the amount
of detail that the sdf annotator emits while parsing the source
file.
Signed-off-by: Stephen Williams <steve@icarus.com>
Pass parsed SDF delays into the vvp run time as vpiScaledRealTime
variables, and handling the mapping of 2-values to 12-delays.
Signed-off-by: Stephen Williams <steve@icarus.com>
This was causing a core dump for a number of the vpi test cases.
The problem was that they did not need the simulation time, so it
was being set to NULL. The problem was the EndOfSimulation callback
was always setting cb_data.time to the current time. A big problem
when the pointer was NULL. The solution is to only set the time
when the pointer is not NULL
String values are known to be 2-value bits, so they are natural
candidates for using the movi instruction to load the value in
far fewer instructions. With this patch, up to 16bits (two bytes)
at a time can be loaded per instruction.
Signed-off-by: Stephen Williams <steve@icarus.com>
A previous patch I submitted to try and keep the $finish time
events missed the case of an infinite loop that did not advance
the micro time step which then prevents the debugger from
generating a finish (you can interrupt an infinite loop, but you
could not finish out of it). This patch adds a check for a
finish after each debugger call to fix this problem.
When nets are forced by non-constant expressions, the value is linked
to the destination net through a force_link. This patch adds the code
needed to unlink a force/link so that release and new force/links can
work. This fixes pr1735836.
Signed-off-by: Stephen Williams <steve@icarus.com>
This patch adds marks at simulation end if needed and enabled (on
and not over the limit) for vcd/lxt/lxt2 files. End of simulation
callbacks also now have the correct simulation time. The default
file names for lxt and lxt2 now match the vcd file except for the
extension. lx2 is also an alias for lxt2 and the default lxt2 file
extension is lx2. This matches what GTKWave expects. Any lxt2
diagnostic output print LXT2 instead of LXT to make it clear which
dumper you are using.
This patch fixes/enhances the array part select code. It also
verifies that any lval array index in a continuous assignment
is a constant value. Also, %set/av now uses index register 1 as
described in the documentation (as a bit offset).
Add/rework VPI access to modpaths. This allows VPI code access to
paths described in specify blocks. In the process, the VPI structure
has been reworked to conform to the IEEE1364 standard.
Signed-off-by: Stephen Williams <steve@icarus.com>
Cleanup klunky implementation of vpi_get/put_delays for modpath
objects. Remove some useless members.
Signed-off-by: Stephen Williams <steve@icarus.com>
Wide division/modulus (more bits than unsigned long) gave incorrect
results when both the divisor and dividend where the same. They also
did not produce an error message when dividing by zero.
The modpath source node defines the modpath object, and carries the
nodes for the source expression. The modpath outputs are references
by pointers to the vpiModPath that is not in itself a vpi object
any more. This makes the VPI view of a module path look like the
source-destinaiton pair that is the IEEE1364 description of the
modpath.
Add vvp support for modpath path term outputs. This also introduces
the concept of path terms and moves towards the path term in general
for getting at the endpoits of a modpath.
Clean up rather poorly written modpath vpi support, fixing the
parse of the modpath syntax element to not use pointless globals.
Collect the modpath code into the delay.cc file instead of the
inapropriate vpi_signal.cc source file.
Signed-off-by: Stephen Williams <steve@icarus.com>
signed compare in proceedural code was comparing the absolute
value if both operands were negative. Wrong!
Signed-off-by: Stephen Williams <steve@icarus.com>
Implement compare-immediate instructions and generate code to use
these new instructions to improve runtime performance.
Signed-off-by: Stephen Williams <steve@icarus.com>
This patch adds ivl_scope_time_precision() to the compiler which can
be used to extract the local scope precision. tgt-stub and tgt-vvp
have been modified to use this new function and output a value that
is appropriate. The vvp runtime has been altered to use this new
data which is accessed with the vpip_time_precision_from_handle()
function. vpiTimePrecision uses this function to return the correct
precision.
Add support for accessing the modpath nodes via PLI,
and add support for the vpi_set_delays and vpi_put_delays
functions to set the delays on those paths.
- GSoC 2007