Original version would iterate exhaustively, even when it was not
necessary.
This version seeks to do the minimum amount of iteration work based
on the information available.
There are numerous concerns with the original code from a read through.
The #define TX_MAX_OPEN_FILES 20 is badly named, the txCommand.c uses
fd_set to hold active FD for each device, and makes no attempt to bounds
check the 'fd' or 'fd_set' of any TxAdd1InputDevice() is in range.
The real name of this variable should be TX_MAX_INPUT_DEVICES as it
related to the fixed compile time slots available for devices, each input
device must have at least one active FD and it can be in the range
that fd_set allows, which is 0..1023 on linux.
So based on this being the original intention, due to the way the code is
constructed and API calls made available, the file has been reworked
to allow all these considerations at the same time.
Now the design should be FD clean for FDs in the range 0..1023 instead of
being artificially clamped to the first 20 FDs being valid input devices.
Renaming of variable 'i' to 'fd' when it relates to an fd number, so all
uses of FD_XXXX() should use fd, this helps clarify the different domain
the number relates to.
When 'i' is used it relates to the index into the txInputDevRec array.
A large part of the concerns relate to trying to mix the two numbering
domains interchangeably, just using a different name really clears up
understanding to the reader.
Other items that look like errors TxDelete1InputDevice() will
shuffle-remove entries with no active FD, but it is iterating an array
it is also modifying, so it looks like it would have incorrectly skipped
the next item that should be inspected just after a shuffle-removal
occurred.
The various iterators that works from 0..TX_MAX_OPEN_FILES incorrectly
limited the visibility of the routines to the first 20 FDs instead of
the full FD_SETSIZE the TxAddInputDevice API call allows to be
registered.
Passing in TxAdd1InputDevice with an fd above 19 would have resulted in
everything looking successful until select() was used, then the kernel
would likely not sleep/wait because the input fd_set would look blank due
to being clipped by nfds=TX_MAX_OPEN_FILES (instead of that plus 1).
The use of TX_MAX_OPEN_FILES in select instead of TX_MAX_OPEN_FILES+1 for
the 'nfds' field would have meant a device with fd=19 would not work as
the design intended.
The define definition has been removed from the module public header,
I assume it was there for use by another module, but the incorrect
select() usage has been corrected over there in a previous commit.
So now the define can be maintained near the array it relates to.
While the code might looks less 'efficient' as it is now sweeping all
1024 FDs when input devices during add/remove (hopefully there maybe some
compiler auto-vectorization/tzcnt use there). The main event loop is
slightly more 'efficient' (especially for the single device with input
fd=0 case) as it will only check the 1 FD each input event loop iteration,
not check all 20 FDs for data readyness everytime.
This encapsulates the expectation the 'fd' is in the permitted range for
the standard sizes 'fd_set'.
This is so there is some form of detection with issues in this area, if
the RLIMIT_NOFILE limit is increased.
The old code would only work is the fileno(stream) returned an fd
in the range 0 <= 19. It would silently fail, if the fd was in
the range 20..1023 because FD_SET() would work and syscall select()
would be limited to only look at the first 20 fd's. Ignoring any
fd's higher even if set.
This would theoretically cause high CPU usage due to select()
never blocking because there are no active fd's in the fd_set
as far as the kernel interprets the request and the kernel would
immediately return.
But reading the code the 1st argument to select() seems self
limiting for no good reason. It should be fileno(stream)+1 as
documented in man select(2).
Added the assertion as well, because we are trying to allow magic
to use fd's beyond the standard environmental limits. So it
becomes an assertion condition if the fd is outside the range
0..1023 because the FD_SET() macro will not operate correctly /
undefined-behaviour.
I can't find any user of this func in the codebase right now.
If you look at sim/SimRsim.c and the use of select() there, it is
correctly using select() to wait on a single fd over there. This
commit changes this code to match this correct usage.
the TiAlloc() and TiFree() code path.
The rationale of the changes:
Performing the one-time initialization check (first call to
mmapTileStore()) for every TiAlloc() is unnecessary if the decision code
is reworked to allow NULL pointers to exist in the computation and still
make the correct decision.
Use of 'unsigned long' for pointer arithmetic is not compatible with _WIN64
compiler model LLP64. Changed to 'pointertype'.
The computations addition/subtraction/greater-than were performed
multiple times. Seemed a convoulted method when a single operation
should be good enough.
The use of a single-linked-list with HEAD and TAIL does not make sense.
My gut is telling me that for the purpose of memory allocation a LIFO is
better as a free-and-reuse of the most recently freed item is more likely
to already be in the CPU cache and the oldest freed item is more likely
to have been evicted from CPU cache. So given the use of a custom
allocator and no support to reclaim/compact or manage fragmentation
other factor didn't carry any weight.
Technically TiAlloc() returns undefined memory and the first action of
the caller is to write new values, but the point remains that write is
more likely to cause eviction of something else from cache with the
original FIFO scheme.
Due to the free list use during TiFree() there is a write to each alloc
which will make the cache-line hotter in the cache (less likely to have
been evicted) when using LIFO scheme than FIFO scheme.
Further more the use of HEAD and TAIL had a far more complex
TileStoreFree() than was necessary even for such a list. A 4 line
method turned into 7 with multiple conditions tested when branching.
The TileStoreFreeList/TileStoreFreeList_end were public symbols which may
also impact the freedom the compiler has to optimise around them.
Using LIFO single-linked-list resulted in the removal of
TileStoreFreeList_end and associated simplification.
Use of 'static' for the methods mmapTileStore() and getTimeFromTileStore()
these are not public API so adding the 'static' give the compiler a hint
these methods maybe inlined as they are not accessile from outside this
compliation unit.
The -O3 assembly result looks quite healthy in achieving the original
goal of instruction and branch reductions with the compiler able to
inline all 3 methods into a single TiAlloc().
This is reducing nearby calls to TiGetClient() API when the value
can be looked up one time and stored in a local variable to make
other decisions about.
This is due to TiGetClient() potentially having a slightly higher
cost to call than previously, this is a kind of peephole
optimization approach (if I can see multiple getters used within
the window it got optmized).
'ticlient' was used for retrieval as ClientData so that future
greps across the codebase for `ti_client` should only match naked
access.
All naked access to `ti_client` now uses the function-like-macro
to encapsulate this action. This macro existed before this just
makes all sites utilize it.
Added additional INT and PTR variants to remove the programmer
load on thinking about casing and casts polluting the point
of use. So the use now looks cleaner.
Equivalent prototypes:
void TiSetClient(Tile*, ClientData)
void TiSetClientINT(Tile*, intptr_t) /* pointertype */
void TiSetClientPTR(Tile*, void*)
ClientData TiGetClient(Tile*)
intptr_t TiGetClientINT(Tile*) /* pointertype */
void *TiGetClientPTR(Tile*)
macro. Based on observation of cells in PDKs where ORIGIN and/or
FOREIGN are non-zero, added code that forces a correction of LEF
macro coordinates to match the GDS coordinates, with an
equivalent negative shift of the LEF macro ORIGIN to compensate.
Normally, both ORIGIN and FOREIGN will be zero and the added code
will do nothing. Note that this code does not handle the
additional optional orientation. A LEF macro with a different
coordinate system than its GDS is already weird; a LEF macro
with a different rotation than its GDS is hopefully something
that nobody ever does in practice. If needed, I'll cross that
bridge when I come to it.
manhattan shapes (especially minimum-sized ones) to be eliminated,
as these can survive a shrink-grow operation intended to get rid
of such shapes. This implementation may not be in its final form
but should suffice for now.
instance names in both the selection and in the root edit CellDef,
and then wipes duplicate names from the selection and regenerates
unique IDs. This avoids the unexpected behavior displayed by
magic in which a "copy" function renames the *original* instance
and gives the original name to the copied instance. This is not
only unexpected, but causes an error in which "undo" after
multiple copies fails to remove earlier copies because the name
change was not recorded, and the instance can no longer be found
by name.