mirror of https://github.com/openXC7/prjxray.git
timfuz: cleanup README
Signed-off-by: John McMaster <johndmcmaster@gmail.com>
This commit is contained in:
parent
edbd5f4ca9
commit
d96bfd7a8b
|
|
@ -0,0 +1,288 @@
|
|||
# Timing analysis fuzzer (timfuz)
|
||||
|
||||
WIP: 2018-09-10: this process is just starting together and is going to get significant cleanup. But heres the general idea
|
||||
|
||||
This runs various designs through Vivado and processes the
|
||||
resulting timing informationin order to create very simple timing models.
|
||||
While Vivado might have more involved models (say RC delays, fanout, etc),
|
||||
timfuz creates simple models that bound realistic min and max element delays.
|
||||
|
||||
Currently this document focuses exclusively on fabric timing delays.
|
||||
|
||||
|
||||
## Quick start
|
||||
|
||||
TODO: make this into a more formal makefile flow
|
||||
|
||||
```
|
||||
# Pre-processing
|
||||
# Create speed.json
|
||||
./speed.sh
|
||||
|
||||
# Create csvs
|
||||
make N=1
|
||||
csv=specimen_001/timing3.csv
|
||||
|
||||
# Main workflow
|
||||
# Discover which variables can be separated
|
||||
python3 timfuz_rref.py --simplify --out sub.json $csv
|
||||
# Verify sub.json makes a solvable solution
|
||||
python3 checksub.py --sub-json sub.json group.csv
|
||||
# Separate variables
|
||||
python3 csv_flat2group.py --sub-json sub.json --strict $csv group.csv
|
||||
# Create a rough timing model that approximately fits the given paths
|
||||
python3 solve_leastsq.py --sub-json sub.json group.csv --out leastsq.csv
|
||||
# Tweak rough timing model, making sure all constraints are satisfied
|
||||
python3 solve_linprog.py --sub-json sub.json --sub-csv leastsq.csv --massage group.csv --out linprog.csv
|
||||
# Take separated variables and back-annotate them to the original timing variables
|
||||
python3 csv_group2flat.py --sub-json sub.json --sort linprog.csv flat.csv
|
||||
|
||||
# Final processing
|
||||
# Create tile.json (where timing models are in tile fabric)
|
||||
python3 tile_txt2json.py timgrid/specimen_001/tiles.txt tiles.json
|
||||
# Insert timing delays into actual tile layouts
|
||||
python3 tile_annotate.py flat.csv tilea.json
|
||||
```
|
||||
|
||||
|
||||
## Vivado background
|
||||
|
||||
Examples are for a XC750T on Vivado 2017.2.
|
||||
|
||||
TODO maybe move to: https://github.com/SymbiFlow/prjxray/wiki/Timing
|
||||
|
||||
|
||||
### Speed index
|
||||
|
||||
Vivado seems to associate each delay model with a "speed index".
|
||||
The fabric has these in two elements: wires (ie one delay element per tile) and pips.
|
||||
For example, LUT output node A (ex: CLBLL_L_X12Y100/CLBLL_LL_A) has a single wire, also called CLBLL_L_X12Y100/CLBLL_LL_A.
|
||||
This has speed index 733. Speed models can be queried and we find this corresponds to model C_CLBLL_LL_A.
|
||||
|
||||
There are various speed model types:
|
||||
* bel_delay
|
||||
* buffer
|
||||
* buffer_switch
|
||||
* content_version
|
||||
* functional
|
||||
* inpin
|
||||
* outpin
|
||||
* parameters
|
||||
* pass_transistor
|
||||
* switch
|
||||
* table_lookup
|
||||
* tl_buffer
|
||||
* vt_limits
|
||||
* wire
|
||||
|
||||
IIRC the interconnect is only composed of switch and wire types.
|
||||
|
||||
Indices with value 65535 (0xFFFF) never appear. Presumably these are unused models.
|
||||
They are used for some special models such as those of type "content_version".
|
||||
For example, the "xilinx" model is of type "content_version".
|
||||
|
||||
There are also "cost codes", but these seem to be very course (only around 30 of these)
|
||||
and are suspected to be related more to PnR than timing model.
|
||||
|
||||
|
||||
### Timing paths
|
||||
|
||||
The Vivado timing analyzer can easily output the following:
|
||||
* Full: delay from BEL pin to BEL pin
|
||||
* Interconnect only (ICO): delay from BEL pin to BEL pin, but only report interconnect delays (ie exclude site delays)
|
||||
|
||||
There is also theoretically an option to report delays up to a specific pip,
|
||||
but this option is poorly documented and I was unable to get it to work.
|
||||
|
||||
Each timing path reports a fast process and a slow process min and max value. So four process values are reported in total:
|
||||
* fast_max
|
||||
* fast_min
|
||||
* slow_max
|
||||
* slow_min
|
||||
|
||||
For example, if the device is end of life, was poorly made, and at an extreme temperature, the delay may be up to the slow_max value.
|
||||
Since ICO can be reported for each of these, fully analyzing a timing path results in 8 values.
|
||||
|
||||
Finally, part of this was analyzing tile regularity to discover what a reasonably compact timing model was.
|
||||
We verified that all tiles of the same type have exactly the same delay elements.
|
||||
|
||||
|
||||
|
||||
## Methodology
|
||||
|
||||
Make sure you've read the Vivado background section first
|
||||
|
||||
|
||||
### Background
|
||||
|
||||
This section briefly describes some of the mathmatics used by this technique that readers may not be familiar with.
|
||||
These definitions are intended to be good enough to provide a high level understanding and may not be precise.
|
||||
|
||||
Numerical analysis: the study of algorithms that use numerical approximation (as opposed to general symbolic manipulations)
|
||||
|
||||
numpy: a popular numerical analysis python library. Often written np (import numpy as np).
|
||||
|
||||
scipy: provides higher level functionality on top of numpy
|
||||
|
||||
sympy ("symbolic python"): like numpy, but is designed to work with rational numbers.
|
||||
For example, python actually stores 0.1 as 0.1000000000000000055511151231257827021181583404541015625.
|
||||
However, sympy can represent this as the fraction 1/10, eliminating numerical approximation issues.
|
||||
|
||||
Least squares (ex: scipy.optimize.least_squares): approximation method to do a best fit of several variables to a set of equations.
|
||||
For example, given the equations "x = 1" and "x = 2" there isn't an exact solution.
|
||||
However, "x = 1.5" is a good compromise since its reasonably solves both equations.
|
||||
|
||||
Linear programming (ex: scipy.optimize.linprog aka linprog): approximation method that finds a set of variables that satisfy a set of inequalities.
|
||||
For example,
|
||||
|
||||
Reduced row echelon form (RREF, ex: sympy.Matrix.rref): the simplest form that a system of linear equations can be solved to.
|
||||
For example, given "x = 1" and "x + y = 9", one can solve for "x = 1" and "y = 8".
|
||||
However, given "x + y = 1" and "x + y + z = 9", there aren't enough variables to solve this fully.
|
||||
In this case RREF provides a best effort by giving the ratios between correlated variables.
|
||||
One variable is normalized to 1 in each of these ratios and is called the "pivot".
|
||||
Note that if numpy.linalg.solve encounters an unsolvable matrix it may either complain
|
||||
or generate a false solution due to numerical approximation issues.
|
||||
|
||||
|
||||
### What didn't work
|
||||
|
||||
First some quick background on things that didn't work to illustrate why the current approach was chosen.
|
||||
I first tried to directly through things into linprog, but it unfairly weighted towards arbitrary shared variables. For example, feeding in:
|
||||
* t0 >= 10
|
||||
* t0 + t1 >= 100
|
||||
|
||||
It would declare "t0 = 100", "t1 = 0" instead of the more intuitive "t0 = 10", "t1 = 90".
|
||||
I tried to work around this in several ways, notably subtracting equations from each other to produce additional constraints.
|
||||
This worked okay, but was relatively slow and wasn't approaching nearly solved solutions, even when throwing a lot of data at it.
|
||||
|
||||
Next we tried randomly combining a bunch of the equations together and solving them like a regular linear algebra matrix (numpy.linalg.solve).
|
||||
However, this illustrated that the system was under-constrained.
|
||||
Further analysis revealed that there are some delay element combinations that simply can't be linearly separated.
|
||||
This was checked primarily using numpy.linalg.matrix_rank, with some use of numpy.linalg.slogdet.
|
||||
matrix_rank was preferred over slogdet since its more flexible against non-square matrices.
|
||||
|
||||
|
||||
### Process
|
||||
|
||||
Above ultimately led to the idea that we should come up with a set of substitutions that would make the system solvable. This has several advantages:
|
||||
* Easy to evaluate which variables aren't covered well enough by source data
|
||||
* Easy to evaluate which variables weren't solved properly (if its fully constrained it should have had a non-zero delay)
|
||||
|
||||
At a high level, the above learnings gave this process:
|
||||
* Find correlated variables by using RREF (sympy.Matrix.rref) to create variable groups
|
||||
- Note pivots
|
||||
- You must input a fractional type (ex: fractions.Fraction, but surprisingly not int) to get exact results, otherwise it seems to fall back to numerical approximation
|
||||
- This is by far the most computationally expensive step
|
||||
- Mixing RREF substitutions from one data set to another may not be recommended
|
||||
* Use RREF result to substitute groups on input data, creating new meta variables, but ultimately reducing the number of columns
|
||||
* Pick a corner
|
||||
- Examples assume fast_max, but other corners are applicable with appropriate column and sign changes
|
||||
* De-duplicate by removing equations that are less constrained
|
||||
- Ex: if solving for a max corner and given:
|
||||
- t0 + t1 >= 10
|
||||
- t0 + t1 >= 12
|
||||
- The first equation is redundant since the second provides a stricter constraint
|
||||
- This significantly reduces computational time
|
||||
* Use least squares (scipy.optimize.least_squares) to fit variables near input constraints
|
||||
- Helps fairly weight delays vs the original input constraints
|
||||
- Does not guarantee all constraints are met. For example, if this was put in (ignoring these would have been de-duplicated):
|
||||
- t0 = 10
|
||||
- t0 = 12
|
||||
- It may decide something like t0 = 11, which means that the second constraint was not satisfied given we actually want t0 >= 12
|
||||
* Use linear programming (scipy.optimize.linprog aka linprog) to formally meet all remaining constraints
|
||||
- Start by filtering out all constraints that are already met. This should eliminate nearly all equations
|
||||
* Map resulting constraints onto different tile types
|
||||
- Group delays map onto the group pivot variable, typically setting other elements to 0 (if the processed set is not the one used to create the pivots they may be non-zero)
|
||||
|
||||
|
||||
## TODO
|
||||
|
||||
Milestone 1 (MVP)
|
||||
* DONE
|
||||
* Provide any process corner with at least some of the fabric
|
||||
|
||||
Milestone 2
|
||||
* Provide all four fabric corners
|
||||
* Simple makefile based flow
|
||||
* Cleanup/separate fabric input targets
|
||||
|
||||
Milestone 3
|
||||
* Create site delay model
|
||||
|
||||
Final
|
||||
* Investigate ZERO
|
||||
* Investigate virtual switchboxes
|
||||
* Compare our vs Xilinx output on random designs
|
||||
|
||||
|
||||
### Improve test cases
|
||||
|
||||
Test cases are somewhat random right now. We could make much more targetted cases using custom routing to improve various fanout estimates and such.
|
||||
Also there are a lot more elements that are not covered.
|
||||
At a minimum these should be moved to their own directory.
|
||||
|
||||
|
||||
### ZERO models
|
||||
|
||||
Background: there are a number of speed models with the name ZERO in them.
|
||||
These generally seem to be zero delay, although needs more investigation.
|
||||
|
||||
Example: see virtual switchbox item below
|
||||
|
||||
The timing models will probably significantly improve if these are removed.
|
||||
In the past I was removing them, but decided to keep them in for now in the spirit of being more conservative.
|
||||
|
||||
They include:
|
||||
* _BSW_CLK_ZERO
|
||||
* BSW_CLK_ZERO
|
||||
* _BSW_ZERO
|
||||
* BSW_ZERO
|
||||
* _B_ZERO
|
||||
* B_ZERO
|
||||
* C_CLK_ZERO
|
||||
* C_DSP_ZERO
|
||||
* C_ZERO
|
||||
* I_ZERO
|
||||
* _O_ZERO
|
||||
* O_ZERO
|
||||
* RC_ZERO
|
||||
* _R_ZERO
|
||||
* R_ZERO
|
||||
|
||||
|
||||
### Virtual switchboxes
|
||||
|
||||
Background: several low level configuration details are abstracted with virtual configurable elements.
|
||||
For example, LUT inputs can be rearranged to reduce routing congestion.
|
||||
However, the LUT configuratioon must be changed to match the switched inputs.
|
||||
This is handled by the CLBLL_L_INTER switchbox, which doesn't encode any physical configuration bits.
|
||||
However, this contains PIPs with delay models.
|
||||
|
||||
For example, LUT A, input A1 has node CLBLM_M_A1 coming from pip junction CLBLM_M_A1 has PIP CLBLM_IMUX7->CLBLM_M_A1
|
||||
with speed index 659 (R_ZERO).
|
||||
|
||||
This might be further evidence on related issue that ZERO models should probably be removed.
|
||||
|
||||
|
||||
### Incporporate fanout
|
||||
|
||||
We could probably significantly improve model granularity by studying delay impact on fanout
|
||||
|
||||
|
||||
### Investigate RC delays
|
||||
|
||||
Suspect accuracy could be significantly improved by moving to SPICE based models. But this will take significantly more characterization
|
||||
|
||||
|
||||
### Characterize real hardware
|
||||
|
||||
A few people have expressed interest on running tests on real hardware. Will take some thought given we don't have direct access
|
||||
|
||||
|
||||
### Review approximation errors
|
||||
|
||||
Ex: one known issue is that the objective function linearly weights small and large delays.
|
||||
This is only recommended when variables are approximately the same order of magnitude.
|
||||
For example, carry chain delays are on the order of 7 ps while other delays are 100 ps.
|
||||
Its very easy to put a large delay on the carry chain while it could have been more appropriately put somewhere else.
|
||||
|
||||
|
|
@ -1,71 +0,0 @@
|
|||
Timing analysis fuzzer
|
||||
This runs some random designs through Vivado and extracts timing information in order to derive timing models
|
||||
While Vivado has more involved RC (spice?) models incorporating fanout and other things,
|
||||
for now we are shooting for simple, conservative models with a min and max timing delay
|
||||
|
||||
|
||||
*******************************************************************************
|
||||
Background
|
||||
*******************************************************************************
|
||||
|
||||
Vivado seems to associate each delay model with a "speed index"
|
||||
In particular, we are currently looking at pips and wires, each of which have a speed index associated with them
|
||||
For every timeing path, we record the total delay from one site to another, excluding site delays
|
||||
(timing analyzer provides an option to make this easy)
|
||||
We then walk along the path and record all wires and pips in between
|
||||
These are converted to their associated speed indexes
|
||||
This gives an equation that a series of speed indexes was given a certain delay value
|
||||
These equations are then fed into scipy.optimize.linprog to give estimates for the delay models
|
||||
|
||||
However, there are some complications. For example:
|
||||
Given a system of equations like:
|
||||
t0 = 5
|
||||
t0 + t1 = 10
|
||||
t0 + t1 + t2 = 12
|
||||
The solver puts all the delays in t0
|
||||
To get around this, we subtract equations from each other
|
||||
|
||||
Some additional info here: https://github.com/SymbiFlow/prjxray/wiki/Timing
|
||||
|
||||
|
||||
*******************************************************************************
|
||||
Quick start
|
||||
*******************************************************************************
|
||||
|
||||
./speed.sh
|
||||
python timfuz_delay.py --cols-max 9 timfuz_dat/s1_timing2.txt
|
||||
Which will report something like
|
||||
Delay on 36 / 162
|
||||
|
||||
Now add some more data in:
|
||||
python timfuz_delay.py --cols-max 9 timfuz_dat/speed_json.json timfuz_dat/s*_timing2.txt
|
||||
Which should get a few more delay elements, say:
|
||||
Delay on 57 / 185
|
||||
|
||||
|
||||
*******************************************************************************
|
||||
From scratch
|
||||
*******************************************************************************
|
||||
|
||||
Roughly something like this
|
||||
Edit generate.tcl
|
||||
Uncomment speed_models2
|
||||
Run "make N=1"
|
||||
python speed_json.py specimen_001/speed_model.txt speed_json.json
|
||||
Edit generate.tcl
|
||||
Comment speed_models2
|
||||
Run "make N=4" to generate some more timing data
|
||||
Now run as in the quick start
|
||||
python timfuz_delay.py --cols-max 9 speed_json.json specimen_*/timing2.txt
|
||||
|
||||
|
||||
*******************************************************************************
|
||||
TODO:
|
||||
*******************************************************************************
|
||||
|
||||
Verify elements are being imported correctly throughout the whole chain
|
||||
Can any wires or similar be aggregated?
|
||||
Ex: if a node consisents of two wire delay models and that pair is never seen elsewhere
|
||||
Look at virtual switchboxes. Can these be removed?
|
||||
Look at suspicous elements like WIRE_RC_ZERO
|
||||
|
||||
|
|
@ -90,6 +90,7 @@ def run(fns_in, sub_json=None, verbose=False):
|
|||
print
|
||||
# https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.linalg.matrix_rank.html
|
||||
print('rank: %s / %d col' % (np.linalg.matrix_rank(Amat), len(names)))
|
||||
# doesn't work on non-square matrices
|
||||
if 0:
|
||||
# https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.linalg.slogdet.html
|
||||
sign, logdet = np.linalg.slogdet(Amat)
|
||||
|
|
|
|||
Loading…
Reference in New Issue