Merge pull request #118 from mcmasterg/timfuz

fabric timing fuzzer
2018-09-19 11:09:53 -07:00 · 2018-09-19 11:09:53 -07:00 · e77842bec9
parent 9d65b21ef4 863bfc53ef
commit e77842bec9
69 changed files with 8542 additions and 25 deletions
--- a/database/Makefile
+++ b/database/Makefile
@ -1,10 +0,0 @@
-.NOTPARALLEL:
-
-SUBDIRS := $(patsubst %/,%, $(wildcard */))
-
-.PHONY: $(SUBDIRS)
-
-$(MAKECMDGOALS): $(SUBDIRS)
-
-$(SUBDIRS):
-	$(MAKE) -C $@ -f ../Makefile.database $(MAKECMDGOALS)
--- a/database/Makefile.database
+++ b/database/Makefile.database
@ -1,15 +0,0 @@
-.PHONY: all clean
-
-all: tilegrid.json
-
-# Small dance to say that there is a single recipe that is run once to generate
-# multiple files.
-tileconn.json tilegrid.json: fuzzers
-.INTERMEDIATE: fuzzers
-fuzzers: SHELL:=/bin/bash
-fuzzers: settings.sh
-	source settings.sh && $(MAKE) -C ../../fuzzers all
-
-clean:
-	rm -f *.db tileconn.json tilegrid.json *.yaml
-	rm -rf html
--- a/fuzzers/007-timing/.gitignore
+++ b/fuzzers/007-timing/.gitignore
@ -0,0 +1,9 @@
+/.Xil
+/design/
+/design.bit
+/design.bits
+/design.dcp
+/usage_statistics_webtalk.*
+/vivado*
+/specimen_*
+build_speed
--- a/fuzzers/007-timing/Makefile
+++ b/fuzzers/007-timing/Makefile
@ -0,0 +1,26 @@
+# for now hammering on just picorv32
+# consider instead aggregating multiple projects
+PRJ?=picorv32
+PRJN?=8
+
+all: build/timgrid-v.json
+
+clean:
+	rm -rf build
+	cd speed && $(MAKE) clean
+	cd timgrid && $(MAKE) clean
+	cd projects/$(PRJ) && $(MAKE) clean
+
+speed/build/speed.json:
+	cd speed && $(MAKE)
+
+timgrid/build/timgrid.json:
+	cd timgrid && $(MAKE)
+
+build/timgrid-v.json: projects/$(PRJ)/build/timgrid-v.json
+	mkdir -p build
+	cp projects/$(PRJ)/build/timgrid-v.json build/timgrid-v.json
+
+projects/$(PRJ)/build/timgrid-v.json: speed/build/speed.json timgrid/build/timgrid.json
+	cd projects/$(PRJ) && $(MAKE) N=$(PRJN)
+
--- a/fuzzers/007-timing/README.md
+++ b/fuzzers/007-timing/README.md
@ -0,0 +1,267 @@
+# Timing analysis fuzzer (timfuz)
+
+WIP: 2018-09-10: this process is just starting together and is going to get significant cleanup. But heres the general idea
+
+This runs various designs through Vivado and processes the
+resulting timing informationin order to create very simple timing models.
+While Vivado might have more involved models (say RC delays, fanout, etc),
+timfuz creates simple models that bound realistic min and max element delays.
+
+Currently this document focuses exclusively on fabric timing delays.
+
+
+## Quick start
+
+```
+make -j$(nproc)
+```
+
+This will take a relatively long time (say 45 min) and generate build/timgrid-v.json.
+You can do a quicker test run (say 3 min) using:
+
+```
+make PRJ=oneblinkw PRJN=1 -j$(nproc)
+```
+
+
+## Vivado background
+
+Examples are for a XC750T on Vivado 2017.2.
+
+TODO maybe move to: https://github.com/SymbiFlow/prjxray/wiki/Timing
+
+
+### Speed index
+
+Vivado seems to associate each delay model with a "speed index".
+The fabric has these in two elements: wires (ie one delay element per tile) and pips.
+For example, LUT output node A (ex: CLBLL_L_X12Y100/CLBLL_LL_A) has a single wire, also called CLBLL_L_X12Y100/CLBLL_LL_A.
+This has speed index 733. Speed models can be queried and we find this corresponds to model C_CLBLL_LL_A.
+
+There are various speed model types:
+* bel_delay
+* buffer
+* buffer_switch
+* content_version
+* functional
+* inpin
+* outpin
+* parameters
+* pass_transistor
+* switch
+* table_lookup
+* tl_buffer
+* vt_limits
+* wire
+
+IIRC the interconnect is only composed of switch and wire types.
+
+Indices with value 65535 (0xFFFF) never appear. Presumably these are unused models.
+They are used for some special models such as those of type "content_version".
+For example, the "xilinx" model is of type "content_version".
+
+There are also "cost codes", but these seem to be very course (only around 30 of these)
+and are suspected to be related more to PnR than timing model.
+
+
+### Timing paths
+
+The Vivado timing analyzer can easily output the following:
+* Full: delay from BEL pin to BEL pin
+* Interconnect only (ICO): delay from BEL pin to BEL pin, but only report interconnect delays (ie exclude site delays)
+
+There is also theoretically an option to report delays up to a specific pip,
+but this option is poorly documented and I was unable to get it to work.
+
+Each timing path reports a fast process and a slow process min and max value. So four process values are reported in total:
+* fast_max
+* fast_min
+* slow_max
+* slow_min
+
+For example, if the device is end of life, was poorly made, and at an extreme temperature, the delay may be up to the slow_max value.
+Since ICO can be reported for each of these, fully analyzing a timing path results in 8 values.
+
+Finally, part of this was analyzing tile regularity to discover what a reasonably compact timing model was.
+We verified that all tiles of the same type have exactly the same delay elements.
+
+
+
+## Methodology
+
+Make sure you've read the Vivado background section first
+
+
+### Background
+
+This section briefly describes some of the mathmatics used by this technique that readers may not be familiar with.
+These definitions are intended to be good enough to provide a high level understanding and may not be precise.
+
+Numerical analysis: the study of algorithms that use numerical approximation (as opposed to general symbolic manipulations)
+
+numpy: a popular numerical analysis python library. Often written np (import numpy as np).
+
+scipy: provides higher level functionality on top of numpy
+
+sympy ("symbolic python"): like numpy, but is designed to work with rational numbers.
+For example, python actually stores 0.1 as 0.1000000000000000055511151231257827021181583404541015625.
+However, sympy can represent this as the fraction 1/10, eliminating numerical approximation issues.
+
+Least squares (ex: scipy.optimize.least_squares): approximation method to do a best fit of several variables to a set of equations.
+For example, given the equations "x = 1" and "x = 2" there isn't an exact solution.
+However, "x = 1.5" is a good compromise since its reasonably solves both equations.
+
+Linear programming (ex: scipy.optimize.linprog aka linprog): approximation method that finds a set of variables that satisfy a set of inequalities.
+For example,
+
+Reduced row echelon form (RREF, ex: sympy.Matrix.rref): the simplest form that a system of linear equations can be solved to.
+For example, given "x = 1" and "x + y = 9", one can solve for "x = 1" and "y = 8".
+However, given "x + y = 1" and "x + y + z = 9", there aren't enough variables to solve this fully.
+In this case RREF provides a best effort by giving the ratios between correlated variables.
+One variable is normalized to 1 in each of these ratios and is called the "pivot".
+Note that if numpy.linalg.solve encounters an unsolvable matrix it may either complain
+or generate a false solution due to numerical approximation issues.
+
+
+### What didn't work
+
+First some quick background on things that didn't work to illustrate why the current approach was chosen.
+I first tried to directly through things into linprog, but it unfairly weighted towards arbitrary shared variables. For example, feeding in:
+* t0 >= 10
+* t0 + t1 >= 100
+
+It would declare "t0 = 100", "t1 = 0" instead of the more intuitive "t0 = 10", "t1 = 90".
+I tried to work around this in several ways, notably subtracting equations from each other to produce additional constraints.
+This worked okay, but was relatively slow and wasn't approaching nearly solved solutions, even when throwing a lot of data at it.
+
+Next we tried randomly combining a bunch of the equations together and solving them like a regular linear algebra matrix (numpy.linalg.solve).
+However, this illustrated that the system was under-constrained.
+Further analysis revealed that there are some delay element combinations that simply can't be linearly separated.
+This was checked primarily using numpy.linalg.matrix_rank, with some use of numpy.linalg.slogdet.
+matrix_rank was preferred over slogdet since its more flexible against non-square matrices.
+
+
+### Process
+
+Above ultimately led to the idea that we should come up with a set of substitutions that would make the system solvable. This has several advantages:
+* Easy to evaluate which variables aren't covered well enough by source data
+* Easy to evaluate which variables weren't solved properly (if its fully constrained it should have had a non-zero delay)
+
+At a high level, the above learnings gave this process:
+* Find correlated variables by using RREF (sympy.Matrix.rref) to create variable groups
+  - Note pivots
+  - You must input a fractional type (ex: fractions.Fraction, but surprisingly not int) to get exact results, otherwise it seems to fall back to numerical approximation
+  - This is by far the most computationally expensive step
+  - Mixing RREF substitutions from one data set to another may not be recommended
+* Use RREF result to substitute groups on input data, creating new meta variables, but ultimately reducing the number of columns
+* Pick a corner
+  - Examples assume fast_max, but other corners are applicable with appropriate column and sign changes
+* De-duplicate by removing equations that are less constrained
+  - Ex: if solving for a max corner and given:
+  - t0 + t1 >= 10
+  - t0 + t1 >= 12
+  - The first equation is redundant since the second provides a stricter constraint
+  - This significantly reduces computational time
+* Use least squares (scipy.optimize.least_squares) to fit variables near input constraints
+  - Helps fairly weight delays vs the original input constraints
+  - Does not guarantee all constraints are met. For example, if this was put in (ignoring these would have been de-duplicated):
+  - t0 = 10
+  - t0 = 12
+  - It may decide something like t0 = 11, which means that the second constraint was not satisfied given we actually want t0 >= 12
+* Use linear programming (scipy.optimize.linprog aka linprog) to formally meet all remaining constraints
+  - Start by filtering out all constraints that are already met. This should eliminate nearly all equations
+* Map resulting constraints onto different tile types
+  - Group delays map onto the group pivot variable, typically setting other elements to 0 (if the processed set is not the one used to create the pivots they may be non-zero)
+
+
+## TODO
+
+Milestone 1 (MVP)
+* DONE
+* Provide any process corner with at least some of the fabric
+
+Milestone 2
+* Provide all four fabric corners
+* Simple makefile based flow
+* Cleanup/separate fabric input targets
+
+Milestone 3
+* Create site delay model
+
+Final
+* Investigate ZERO
+* Investigate virtual switchboxes
+* Compare our vs Xilinx output on random designs
+
+
+### Improve test cases
+
+Test cases are somewhat random right now. We could make much more targetted cases using custom routing to improve various fanout estimates and such.
+Also there are a lot more elements that are not covered.
+At a minimum these should be moved to their own directory.
+
+
+### ZERO models
+
+Background: there are a number of speed models with the name ZERO in them.
+These generally seem to be zero delay, although needs more investigation.
+
+Example: see virtual switchbox item below
+
+The timing models will probably significantly improve if these are removed.
+In the past I was removing them, but decided to keep them in for now in the spirit of being more conservative.
+
+They include:
+ * _BSW_CLK_ZERO
+ * BSW_CLK_ZERO
+ * _BSW_ZERO
+ * BSW_ZERO
+ * _B_ZERO
+ * B_ZERO
+ * C_CLK_ZERO
+ * C_DSP_ZERO
+ * C_ZERO
+ * I_ZERO
+ * _O_ZERO
+ * O_ZERO
+ * RC_ZERO
+ * _R_ZERO
+ * R_ZERO
+
+
+### Virtual switchboxes
+
+Background: several low level configuration details are abstracted with virtual configurable elements.
+For example, LUT inputs can be rearranged to reduce routing congestion.
+However, the LUT configuratioon must be changed to match the switched inputs.
+This is handled by the CLBLL_L_INTER switchbox, which doesn't encode any physical configuration bits.
+However, this contains PIPs with delay models.
+
+For example, LUT A, input A1 has node CLBLM_M_A1 coming from pip junction CLBLM_M_A1 has PIP CLBLM_IMUX7->CLBLM_M_A1
+with speed index 659 (R_ZERO).
+
+This might be further evidence on related issue that ZERO models should probably be removed.
+
+
+### Incporporate fanout
+
+We could probably significantly improve model granularity by studying delay impact on fanout
+
+
+### Investigate RC delays
+
+Suspect accuracy could be significantly improved by moving to SPICE based models. But this will take significantly more characterization
+
+
+### Characterize real hardware
+
+A few people have expressed interest on running tests on real hardware. Will take some thought given we don't have direct access
+
+
+### Review approximation errors
+
+Ex: one known issue is that the objective function linearly weights small and large delays.
+This is only recommended when variables are approximately the same order of magnitude.
+For example, carry chain delays are on the order of 7 ps while other delays are 100 ps.
+Its very easy to put a large delay on the carry chain while it could have been more appropriately put somewhere else.
+
--- a/fuzzers/007-timing/benchmark.py
+++ b/fuzzers/007-timing/benchmark.py
@ -0,0 +1,73 @@
+'''
+pr0ntools
+Benchmarking utility
+Copyright 2010 John McMaster
+'''
+
+import time
+
+
+def time_str(delta):
+    fraction = delta % 1
+    delta -= fraction
+    delta = int(delta)
+    seconds = delta % 60
+    delta /= 60
+    minutes = delta % 60
+    delta /= 60
+    hours = delta
+    return '%02d:%02d:%02d.%04d' % (hours, minutes, seconds, fraction * 10000)
+
+
+class Benchmark:
+    start_time = None
+    end_time = None
+
+    def __init__(self, max_items=None):
+        # For the lazy
+        self.start_time = time.time()
+        self.end_time = None
+        self.max_items = max_items
+        self.cur_items = 0
+
+    def start(self):
+        self.start_time = time.time()
+        self.end_time = None
+        self.cur_items = 0
+
+    def stop(self):
+        self.end_time = time.time()
+
+    def advance(self, n=1):
+        self.cur_items += n
+
+    def set_cur_items(self, n):
+        self.cur_items = n
+
+    def delta_s(self):
+        if self.end_time:
+            return self.end_time - self.start_time
+        else:
+            return time.time() - self.start_time
+
+    def __str__(self):
+        if self.end_time:
+            return time_str(self.end_time - self.start_time)
+        elif self.max_items:
+            cur_time = time.time()
+            delta_t = cur_time - self.start_time
+            rate_s = 'N/A'
+            if delta_t > 0.000001:
+                rate = self.cur_items / (delta_t)
+                rate_s = '%f items / sec' % rate
+                if rate == 0:
+                    eta_str = 'inf'
+                else:
+                    remaining = (self.max_items - self.cur_items) / rate
+                    eta_str = time_str(remaining)
+            else:
+                eta_str = "indeterminate"
+            return '%d / %d, ETA: %s @ %s' % (
+                self.cur_items, self.max_items, eta_str, rate_s)
+        else:
+            return time_str(time.time() - self.start_time)
--- a/fuzzers/007-timing/checksub.py
+++ b/fuzzers/007-timing/checksub.py
@ -0,0 +1,134 @@
+#!/usr/bin/env python3
+
+from timfuz import Benchmark, Ar_di2np, Ar_ds2t, A_di2ds, A_ds2di, loadc_Ads_b, index_names, A_ds2np, load_sub, run_sub_json
+import numpy as np
+import glob
+import json
+import math
+from collections import OrderedDict
+from fractions import Fraction
+
+
+def Adi2matrix_random(A_ubd, b_ub, names):
+    # random assignment
+    # was making some empty rows
+
+    A_ret = [np.zeros(len(names)) for _i in range(len(names))]
+    b_ret = np.zeros(len(names))
+
+    for row, b in zip(A_ubd, b_ub):
+        # Randomly assign to a row
+        dst_rowi = random.randint(0, len(names) - 1)
+        rownp = Ar_di2np(row, cols=len(names), sf=1)
+
+        A_ret[dst_rowi] = np.add(A_ret[dst_rowi], rownp)
+        b_ret[dst_rowi] += b
+    return A_ret, b_ret
+
+
+def Ads2matrix_linear(Ads, b):
+    names, Adi = A_ds2di(Ads)
+    cols = len(names)
+    rows_out = len(b)
+
+    A_ret = [np.zeros(cols) for _i in range(rows_out)]
+    b_ret = np.zeros(rows_out)
+
+    dst_rowi = 0
+    for row_di, row_b in zip(Adi, b):
+        row_np = Ar_di2np(row_di, cols)
+
+        A_ret[dst_rowi] = np.add(A_ret[dst_rowi], row_np)
+        b_ret[dst_rowi] += row_b
+        dst_rowi = (dst_rowi + 1) % rows_out
+    return A_ret, b_ret
+
+
+def pmatrix(Anp, s):
+    import sympy
+    msym = sympy.Matrix(Anp)
+    print(s)
+    sympy.pprint(msym)
+
+
+def pds(Ads, s):
+    names, Anp = A_ds2np(Ads)
+    pmatrix(Anp, s)
+    print('Names: %s' % (names, ))
+
+
+def run(fns_in, sub_json=None, verbose=False):
+    assert len(fns_in) > 0
+    # arbitrary corner...data is thrown away
+    Ads, b = loadc_Ads_b(fns_in, "slow_max", ico=True)
+
+    if sub_json:
+        print('Subbing JSON %u rows' % len(Ads))
+        #pds(Ads, 'Orig')
+        names_old = index_names(Ads)
+        run_sub_json(Ads, sub_json, verbose=verbose)
+        names_new = index_names(Ads)
+        print("Sub: %u => %u names" % (len(names_old), len(names_new)))
+        print(names_new)
+        print('Subbed JSON %u rows' % len(Ads))
+        names = names_new
+        #pds(Ads, 'Sub')
+    else:
+        names = index_names(Ads)
+
+    # Squash into a matrix
+    # A_ub2, b_ub2 = Adi2matrix_random(A_ubd, b, names)
+    Amat, _bmat = Ads2matrix_linear(Ads, b)
+    #pmatrix(Amat, 'Matrix')
+    '''
+    The matrix must be fully ranked to even be considered reasonable
+    Even then, floating point error *possibly* could make it fully ranked, although probably not since we have whole numbers
+    Hence the slogdet check
+    '''
+    print
+    # https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.linalg.matrix_rank.html
+    rank = np.linalg.matrix_rank(Amat)
+    print('rank: %s / %d col' % (rank, len(names)))
+    # doesn't work on non-square matrices
+    if 0:
+        # https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.linalg.slogdet.html
+        sign, logdet = np.linalg.slogdet(Amat)
+        # If the determinant is zero, then sign will be 0 and logdet will be -Inf
+        if sign == 0 and logdet == float('-inf'):
+            print('slogdet :( : 0')
+        else:
+            print('slogdet :) : %s, %s' % (sign, logdet))
+    if rank != len(names):
+        raise Exception(
+            "Matrix not fully ranked w/ %u / %u" % (rank, len(names)))
+
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description='Check if sub.json would make a linear equation solvable')
+
+    parser.add_argument('--verbose', action='store_true', help='')
+    parser.add_argument('--sub-json', help='')
+    parser.add_argument('fns_in', nargs='*', help='timing3.csv input files')
+    args = parser.parse_args()
+    # Store options in dict to ease passing through functions
+    bench = Benchmark()
+
+    fns_in = args.fns_in
+    if not fns_in:
+        fns_in = glob.glob('specimen_*/timing3.csv')
+
+    sub_json = None
+    if args.sub_json:
+        sub_json = load_sub(args.sub_json)
+
+    try:
+        run(sub_json=sub_json, fns_in=fns_in, verbose=args.verbose)
+    finally:
+        print('Exiting after %s' % bench)
+
+
+if __name__ == '__main__':
+    main()
--- a/fuzzers/007-timing/corner_csv.py
+++ b/fuzzers/007-timing/corner_csv.py
@ -0,0 +1,58 @@
+#!/usr/bin/env python3
+
+from timfuz import Benchmark, simplify_rows, loadc_Ads_b
+import glob
+
+
+def run(fout, fns_in, corner, verbose=0):
+    Ads, b = loadc_Ads_b(fns_in, corner, ico=True)
+    Ads, b = simplify_rows(Ads, b, corner=corner)
+
+    fout.write('ico,fast_max fast_min slow_max slow_min,rows...\n')
+    for row_b, row_ds in zip(b, Ads):
+        # write in same format, but just stick to this corner
+        out_b = [str(row_b) for _i in range(4)]
+        ico = '1'
+        items = [ico, ' '.join(out_b)]
+
+        for k, v in sorted(row_ds.items()):
+            items.append('%u %s' % (v, k))
+        fout.write(','.join(items) + '\n')
+
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description='Create a .csv with a single process corner')
+
+    parser.add_argument('--verbose', type=int, help='')
+    parser.add_argument(
+        '--auto-name', action='store_true', help='timing3.csv => timing3c.csv')
+    parser.add_argument('--out', default=None, help='Output csv')
+    parser.add_argument('--corner', help='Output csv')
+    parser.add_argument('fns_in', nargs='+', help='timing3.csv input files')
+    args = parser.parse_args()
+    bench = Benchmark()
+
+    fnout = args.out
+    if fnout is None:
+        if args.auto_name:
+            assert len(args.fns_in) == 1
+            fnin = args.fns_in[0]
+            fnout = fnin.replace('timing3.csv', 'timing3c.csv')
+            assert fnout != fnin, 'Expect timing3.csv in'
+        else:
+            fnout = '/dev/stdout'
+    print("Writing to %s" % fnout)
+    fout = open(fnout, 'w')
+
+    fns_in = args.fns_in
+    if not fns_in:
+        fns_in = glob.glob('specimen_*/timing3.csv')
+
+    run(fout=fout, fns_in=fns_in, corner=args.corner, verbose=args.verbose)
+
+
+if __name__ == '__main__':
+    main()
--- a/fuzzers/007-timing/csv_flat2group.py
+++ b/fuzzers/007-timing/csv_flat2group.py
@ -0,0 +1,71 @@
+#!/usr/bin/env python3
+
+from timfuz import Benchmark, loadc_Ads_bs, index_names, load_sub, run_sub_json, instances
+
+
+def gen_group(fnin, sub_json, strict=False, verbose=False):
+    print('Loading data')
+    Ads, bs = loadc_Ads_bs([fnin], ico=True)
+
+    print('Sub: %u rows' % len(Ads))
+    iold = instances(Ads)
+    names_old = index_names(Ads)
+    run_sub_json(Ads, sub_json, strict=strict, verbose=verbose)
+    names = index_names(Ads)
+    print("Sub: %u => %u names" % (len(names_old), len(names)))
+    print('Sub: %u => %u instances' % (iold, instances(Ads)))
+
+    for row_ds, row_bs in zip(Ads, bs):
+        yield row_ds, row_bs
+
+
+def run(fns_in, fnout, sub_json, strict=False, verbose=False):
+    with open(fnout, 'w') as fout:
+        fout.write('ico,fast_max fast_min slow_max slow_min,rows...\n')
+        for fn_in in fns_in:
+            for row_ds, row_bs in gen_group(fn_in, sub_json, strict=strict):
+                row_ico = 1
+                items = [str(row_ico), ' '.join([str(x) for x in row_bs])]
+                for k, v in sorted(row_ds.items()):
+                    items.append('%u %s' % (v, k))
+                fout.write(','.join(items) + '\n')
+
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description='Substitute .csv to group correlated variables')
+
+    parser.add_argument('--verbose', action='store_true', help='')
+    parser.add_argument('--strict', action='store_true', help='')
+    parser.add_argument('--sub-csv', help='')
+    parser.add_argument(
+        '--sub-json',
+        required=True,
+        help='Group substitutions to make fully ranked')
+    parser.add_argument('--out', help='Output sub.json substitution result')
+    parser.add_argument('fns_in', nargs='+', help='timing3.csv input files')
+    args = parser.parse_args()
+    # Store options in dict to ease passing through functions
+    bench = Benchmark()
+
+    fns_in = args.fns_in
+    if not fns_in:
+        fns_in = glob.glob('specimen_*/timing3.csv')
+
+    sub_json = load_sub(args.sub_json)
+
+    try:
+        run(
+            fns_in,
+            args.out,
+            sub_json=sub_json,
+            strict=args.strict,
+            verbose=args.verbose)
+    finally:
+        print('Exiting after %s' % bench)
+
+
+if __name__ == '__main__':
+    main()
--- a/fuzzers/007-timing/csv_group2flat.py
+++ b/fuzzers/007-timing/csv_group2flat.py
@ -0,0 +1,98 @@
+#!/usr/bin/env python3
+
+from timfuz import Benchmark, loadc_Ads_bs, load_sub, Ads2bounds, corners2csv, corner_s2i
+
+
+def gen_flat(fns_in, sub_json, corner=None):
+    Ads, bs = loadc_Ads_bs(fns_in, ico=True)
+    bounds = Ads2bounds(Ads, bs)
+    zeros = set()
+    nonzeros = set()
+
+    for bound_name, bound_bs in bounds.items():
+        sub = sub_json['subs'].get(bound_name, None)
+        if sub:
+            # put entire delay into pivot
+            pivot = sub_json['pivots'][bound_name]
+            assert pivot not in zeros
+            nonzeros.add(pivot)
+            non_pivot = set(sub.keys() - set([pivot]))
+            #for name in non_pivot:
+            #    assert name not in nonzeros, (pivot, name, nonzeros)
+            zeros.update(non_pivot)
+            yield pivot, bound_bs
+        else:
+            nonzeros.add(bound_name)
+            yield bound_name, bound_bs
+    # non-pivots can appear multiple times, but they should always be zero
+    # however, due to substitution limitations, just warn
+    violations = zeros.intersection(nonzeros)
+    if len(violations):
+        print('WARNING: %s non-0 non-pivot' % (len(violations)))
+
+    # XXX: how to best handle these?
+    # should they be fixed 0?
+    if corner:
+        zero_row = [None, None, None, None]
+        zero_row[corner_s2i[corner]] = 0
+        for zero in zeros - violations:
+            yield zero, zero_row
+
+
+def run(fns_in, fnout, sub_json, corner=None, sort=False, verbose=False):
+    '''
+    if sort:
+        sortf = sorted
+    else:
+        sortf = lambda x: x
+    '''
+
+    with open(fnout, 'w') as fout:
+        fout.write('ico,fast_max fast_min slow_max slow_min,rows...\n')
+        #for name, corners in sortf(gen_flat(fnin, sub_json)):
+        for name, corners in gen_flat(fns_in, sub_json, corner=corner):
+            row_ico = 1
+            items = [str(row_ico), corners2csv(corners)]
+            items.append('%u %s' % (1, name))
+            fout.write(','.join(items) + '\n')
+
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description='Substitute .csv to ungroup correlated variables')
+
+    parser.add_argument('--verbose', action='store_true', help='')
+    #parser.add_argument('--sort', action='store_true', help='')
+    parser.add_argument('--sub-csv', help='')
+    parser.add_argument(
+        '--sub-json',
+        required=True,
+        help='Group substitutions to make fully ranked')
+    parser.add_argument('--corner', default=None, help='')
+    parser.add_argument('--out', default=None, help='output timing delay .csv')
+    parser.add_argument(
+        'fns_in',
+        nargs='+',
+        help='input timing delay .csv (NOTE: must be single column)')
+    args = parser.parse_args()
+    # Store options in dict to ease passing through functions
+    bench = Benchmark()
+
+    sub_json = load_sub(args.sub_json)
+
+    try:
+        run(
+            args.fns_in,
+            args.out,
+            sub_json=sub_json,
+            #sort=args.sort,
+            verbose=args.verbose,
+            corner=args.corner)
+    finally:
+        print('Exiting after %s' % bench)
+
+
+if __name__ == '__main__':
+    main()
--- a/fuzzers/007-timing/projects/.gitignore
+++ b/fuzzers/007-timing/projects/.gitignore
@ -0,0 +1,3 @@
+specimen_*
+build
+
--- a/fuzzers/007-timing/projects/Makefile
+++ b/fuzzers/007-timing/projects/Makefile
@ -0,0 +1,4 @@
+all:
+	echo "FIXME: tie projects together"
+	false
+
--- a/fuzzers/007-timing/projects/corner.mk
+++ b/fuzzers/007-timing/projects/corner.mk
@ -0,0 +1,47 @@
+# Run corner specific calculations
+
+TIMFUZ_DIR=$(XRAY_DIR)/fuzzers/007-timing
+CORNER=slow_max
+ALLOW_ZERO_EQN?=N
+BADPRJ_OK?=N
+
+all: build/$(CORNER)/timgrid-s.json
+
+run:
+	$(MAKE) clean
+	$(MAKE) all
+	touch run.ok
+
+clean:
+	rm -rf specimen_[0-9][0-9][0-9]/ seg_clblx.segbits __pycache__ run.ok
+	rm -rf vivado*.log vivado_*.str vivado*.jou design *.bits *.dcp *.bit
+	rm -rf build
+
+.PHONY: all run clean
+
+build/$(CORNER):
+	mkdir build/$(CORNER)
+
+build/checksub:
+	false
+
+build/$(CORNER)/leastsq.csv: build/sub.json build/grouped.csv build/checksub build/$(CORNER)
+	# Create a rough timing model that approximately fits the given paths
+	python3 $(TIMFUZ_DIR)/solve_leastsq.py --sub-json build/sub.json build/grouped.csv --corner $(CORNER) --out build/$(CORNER)/leastsq.csv.tmp
+	mv build/$(CORNER)/leastsq.csv.tmp build/$(CORNER)/leastsq.csv
+
+build/$(CORNER)/linprog.csv: build/$(CORNER)/leastsq.csv build/grouped.csv
+	# Tweak rough timing model, making sure all constraints are satisfied
+	ALLOW_ZERO_EQN=$(ALLOW_ZERO_EQN) python3 $(TIMFUZ_DIR)/solve_linprog.py --sub-json build/sub.json --sub-csv build/$(CORNER)/leastsq.csv --massage build/grouped.csv --corner $(CORNER) --out build/$(CORNER)/linprog.csv.tmp
+	mv build/$(CORNER)/linprog.csv.tmp build/$(CORNER)/linprog.csv
+
+build/$(CORNER)/flat.csv: build/$(CORNER)/linprog.csv
+	# Take separated variables and back-annotate them to the original timing variables
+	python3 $(TIMFUZ_DIR)/csv_group2flat.py --sub-json build/sub.json --corner $(CORNER) --out build/$(CORNER)/flat.csv.tmp build/$(CORNER)/linprog.csv
+	mv build/$(CORNER)/flat.csv.tmp build/$(CORNER)/flat.csv
+
+build/$(CORNER)/timgrid-s.json: build/$(CORNER)/flat.csv
+	# Final processing
+	# Insert timing delays into actual tile layouts
+	python3 $(TIMFUZ_DIR)/tile_annotate.py --timgrid-s $(TIMFUZ_DIR)/timgrid/build/timgrid-s.json --out build/$(CORNER)/timgrid-vc.json build/$(CORNER)/flat.csv
+
--- a/fuzzers/007-timing/projects/generate.sh
+++ b/fuzzers/007-timing/projects/generate.sh
@ -0,0 +1,9 @@
+#!/bin/bash
+
+source ${XRAY_GENHEADER}
+TIMFUZ_DIR=$XRAY_DIR/fuzzers/007-timing
+
+timing_txt2csv () {
+    python3 $TIMFUZ_DIR/timing_txt2csv.py --speed-json $TIMFUZ_DIR/speed/build/speed.json --out timing3.csv timing3.txt
+}
+
--- a/fuzzers/007-timing/projects/oneblinkw/Makefile
+++ b/fuzzers/007-timing/projects/oneblinkw/Makefile
@ -0,0 +1,4 @@
+# so simple that likely no linprog adjustments will be needed
+ALLOW_ZERO_EQN?=Y
+include ../project.mk
+
--- a/fuzzers/007-timing/projects/oneblinkw/README.md
+++ b/fuzzers/007-timing/projects/oneblinkw/README.md
@ -0,0 +1,2 @@
+Minimal, simple test
+
--- a/fuzzers/007-timing/projects/oneblinkw/generate.sh
+++ b/fuzzers/007-timing/projects/oneblinkw/generate.sh
@ -0,0 +1,8 @@
+#!/bin/bash
+
+set -ex
+source ../generate.sh
+
+vivado -mode batch -source ../generate.tcl
+timing_txt2csv
+
--- a/fuzzers/007-timing/projects/oneblinkw/generate.tcl
+++ b/fuzzers/007-timing/projects/oneblinkw/generate.tcl
@ -0,0 +1,47 @@
+source ../../../../../utils/utils.tcl
+source ../../project.tcl
+
+proc build_design {} {
+    create_project -force -part $::env(XRAY_PART) design design
+    read_verilog ../top.v
+    synth_design -top top
+
+    puts "Locking pins"
+    set_property LOCK_PINS {I0:A1 I1:A2 I2:A3 I3:A4 I4:A5 I5:A6} \
+		[get_cells -quiet -filter {REF_NAME == LUT6} -hierarchical]
+
+    puts "Package stuff"
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_00) IOSTANDARD LVCMOS33" [get_ports clk]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_01) IOSTANDARD LVCMOS33" [get_ports stb]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_02) IOSTANDARD LVCMOS33" [get_ports di]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_03) IOSTANDARD LVCMOS33" [get_ports do]
+
+    puts "pblocking"
+    create_pblock roi
+    set roipb [get_pblocks roi]
+    set_property EXCLUDE_PLACEMENT 1 $roipb
+    add_cells_to_pblock $roipb [get_cells roi]
+    resize_pblock $roipb -add "$::env(XRAY_ROI)"
+
+    puts "randplace"
+    #randplace_pblock 50 roi
+
+    set_property CFGBVS VCCO [current_design]
+    set_property CONFIG_VOLTAGE 3.3 [current_design]
+    set_property BITSTREAM.GENERAL.PERFRAMECRC YES [current_design]
+
+    puts "dedicated route"
+    set_property CLOCK_DEDICATED_ROUTE FALSE [get_nets clk_IBUF]
+
+    place_design
+    route_design
+
+    write_checkpoint -force design.dcp
+    # disable combinitorial loop
+    # set_property IS_ENABLED 0 [get_drc_checks {LUTLP-1}]
+    #write_bitstream -force design.bit
+}
+
+build_design
+write_info3
+
--- a/fuzzers/007-timing/projects/oneblinkw/top.v
+++ b/fuzzers/007-timing/projects/oneblinkw/top.v
@ -0,0 +1,37 @@
+module roi (
+        input wire clk,
+        output wire out);
+    reg [23:0] counter;
+    assign out = counter[23] ^ counter[22] ^ counter[2] && counter[1] || counter[0];
+
+    always @(posedge clk) begin
+        counter <= counter + 1;
+    end
+endmodule
+
+module top(input wire clk, input wire stb, input wire di, output wire do);
+    localparam integer DIN_N = 0;
+    localparam integer DOUT_N = 1;
+
+    reg [DIN_N-1:0] din;
+    wire [DOUT_N-1:0] dout;
+
+    reg [DIN_N-1:0] din_shr;
+    reg [DOUT_N-1:0] dout_shr;
+
+    always @(posedge clk) begin
+        din_shr <= {din_shr, di};
+        dout_shr <= {dout_shr, din_shr[DIN_N-1]};
+        if (stb) begin
+            din <= din_shr;
+            dout_shr <= dout;
+        end
+    end
+
+    assign do = dout_shr[DOUT_N-1];
+    roi roi(
+            .clk(clk),
+            .out(dout[0])
+            );
+endmodule
+
--- a/fuzzers/007-timing/projects/picorv32/Makefile
+++ b/fuzzers/007-timing/projects/picorv32/Makefile
@ -0,0 +1,2 @@
+include ../project.mk
+
--- a/fuzzers/007-timing/projects/picorv32/README.md
+++ b/fuzzers/007-timing/projects/picorv32/README.md
@ -0,0 +1,3 @@
+picorv32 uses more advanced structures than simple LUT/FF tests (such as carry chain)
+It attempts to provide some realistic timing data wrt fanout and used fabric
+
--- a/fuzzers/007-timing/projects/picorv32/generate.sh
+++ b/fuzzers/007-timing/projects/picorv32/generate.sh
@ -0,0 +1,8 @@
+#!/bin/bash
+
+set -ex
+source ../generate.sh
+
+vivado -mode batch -source ../generate.tcl
+timing_txt2csv
+
--- a/fuzzers/007-timing/projects/picorv32/generate.tcl
+++ b/fuzzers/007-timing/projects/picorv32/generate.tcl
@ -0,0 +1,48 @@
+source ../../../../../utils/utils.tcl
+source ../../project.tcl
+
+proc build_design {} {
+    create_project -force -part $::env(XRAY_PART) design design
+    read_verilog ../../../src/picorv32.v
+    read_verilog ../top.v
+    synth_design -top top
+
+    puts "Locking pins"
+    set_property LOCK_PINS {I0:A1 I1:A2 I2:A3 I3:A4 I4:A5 I5:A6} \
+		[get_cells -quiet -filter {REF_NAME == LUT6} -hierarchical]
+
+    puts "Package stuff"
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_00) IOSTANDARD LVCMOS33" [get_ports clk]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_01) IOSTANDARD LVCMOS33" [get_ports stb]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_02) IOSTANDARD LVCMOS33" [get_ports di]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_03) IOSTANDARD LVCMOS33" [get_ports do]
+
+    puts "pblocking"
+    create_pblock roi
+    set roipb [get_pblocks roi]
+    set_property EXCLUDE_PLACEMENT 1 $roipb
+    add_cells_to_pblock $roipb [get_cells roi]
+    resize_pblock $roipb -add "$::env(XRAY_ROI)"
+
+    puts "randplace"
+    randplace_pblock 50 roi
+
+    set_property CFGBVS VCCO [current_design]
+    set_property CONFIG_VOLTAGE 3.3 [current_design]
+    set_property BITSTREAM.GENERAL.PERFRAMECRC YES [current_design]
+
+    puts "dedicated route"
+    set_property CLOCK_DEDICATED_ROUTE FALSE [get_nets clk_IBUF]
+
+    place_design
+    route_design
+
+    write_checkpoint -force design.dcp
+    # disable combinitorial loop
+    # set_property IS_ENABLED 0 [get_drc_checks {LUTLP-1}]
+    #write_bitstream -force design.bit
+}
+
+build_design
+write_info3
+
--- a/fuzzers/007-timing/projects/picorv32/top.v
+++ b/fuzzers/007-timing/projects/picorv32/top.v
@ -0,0 +1,109 @@
+//move some stuff to minitests/ncy0
+
+`define SEED 32'h12345678
+
+module top(input clk, stb, di, output do);
+	localparam integer DIN_N = 42;
+	localparam integer DOUT_N = 79;
+
+	reg [DIN_N-1:0] din;
+	wire [DOUT_N-1:0] dout;
+
+	reg [DIN_N-1:0] din_shr;
+	reg [DOUT_N-1:0] dout_shr;
+
+	always @(posedge clk) begin
+		din_shr <= {din_shr, di};
+		dout_shr <= {dout_shr, din_shr[DIN_N-1]};
+		if (stb) begin
+			din <= din_shr;
+			dout_shr <= dout;
+		end
+	end
+
+	assign do = dout_shr[DOUT_N-1];
+
+	roi #(.DIN_N(DIN_N), .DOUT_N(DOUT_N))
+	    roi (
+		.clk(clk),
+		.din(din),
+		.dout(dout)
+	);
+endmodule
+
+module roi(input clk, input [DIN_N-1:0] din, output [DOUT_N-1:0] dout);
+	parameter integer DIN_N = -1;
+	parameter integer DOUT_N = -1;
+
+    /*
+    //Take out for now to make sure LUTs are more predictable
+	picorv32 picorv32 (
+		.clk(clk),
+		.resetn(din[0]),
+		.mem_valid(dout[0]),
+		.mem_instr(dout[1]),
+		.mem_ready(din[1]),
+		.mem_addr(dout[33:2]),
+		.mem_wdata(dout[66:34]),
+		.mem_wstrb(dout[70:67]),
+		.mem_rdata(din[33:2])
+	);
+	*/
+
+    /*
+	randluts randluts (
+		.din(din[41:34]),
+		.dout(dout[78:71])
+	);
+	*/
+	randluts #(.N(150)) randluts (
+		.din(din[41:34]),
+		.dout(dout[78:71])
+	);
+endmodule
+
+module randluts(input [7:0] din, output [7:0] dout);
+	parameter integer N = 250;
+
+	function [31:0] xorshift32(input [31:0] xorin);
+		begin
+			xorshift32 = xorin;
+			xorshift32 = xorshift32 ^ (xorshift32 << 13);
+			xorshift32 = xorshift32 ^ (xorshift32 >> 17);
+			xorshift32 = xorshift32 ^ (xorshift32 <<  5);
+		end
+	endfunction
+
+	function [63:0] lutinit(input [7:0] a, b);
+		begin
+			lutinit[63:32] = xorshift32(xorshift32(xorshift32(xorshift32({a, b} ^ `SEED))));
+			lutinit[31: 0] = xorshift32(xorshift32(xorshift32(xorshift32({b, a} ^ `SEED))));
+		end
+	endfunction
+
+	wire [(N+1)*8-1:0] nets;
+
+	assign nets[7:0] = din;
+	assign dout = nets[(N+1)*8-1:N*8];
+
+	genvar i, j;
+	generate
+		for (i = 0; i < N; i = i+1) begin:is
+			for (j = 0; j < 8; j = j+1) begin:js
+				localparam integer k = xorshift32(xorshift32(xorshift32(xorshift32((i << 20) ^ (j << 10) ^ `SEED)))) & 255;
+				(* KEEP, DONT_TOUCH *)
+				LUT6 #(
+					.INIT(lutinit(i, j))
+				) lut (
+					.I0(nets[8*i+(k+0)%8]),
+					.I1(nets[8*i+(k+1)%8]),
+					.I2(nets[8*i+(k+2)%8]),
+					.I3(nets[8*i+(k+3)%8]),
+					.I4(nets[8*i+(k+4)%8]),
+					.I5(nets[8*i+(k+5)%8]),
+					.O(nets[8*i+8+j])
+				);
+			end
+		end
+	endgenerate
+endmodule
--- a/fuzzers/007-timing/projects/placelut/Makefile
+++ b/fuzzers/007-timing/projects/placelut/Makefile
@ -0,0 +1,2 @@
+include ../project.mk
+
--- a/fuzzers/007-timing/projects/placelut/README.md
+++ b/fuzzers/007-timing/projects/placelut/README.md
@ -0,0 +1,2 @@
+LUTs are physically laid out in an array and directly connected to a test pattern generator
+
--- a/fuzzers/007-timing/projects/placelut/generate.py
+++ b/fuzzers/007-timing/projects/placelut/generate.py
@ -0,0 +1,92 @@
+#!/usr/bin/env python
+
+import argparse
+
+parser = argparse.ArgumentParser(description='')
+parser.add_argument('--sdx', default='8', help='')
+parser.add_argument('--sdy', default='4', help='')
+args = parser.parse_args()
+'''
+Generate in pairs
+Fill up switchbox quad for now
+Create random connections between the LUTs
+See how much routing pressure we can generate
+
+Start with non-random connections to the LFSR for solver comparison
+
+Start at SLICE_X16Y102
+'''
+SBASE = (16, 102)
+SDX = int(args.sdx, 0)
+SDY = int(args.sdy, 0)
+nlut = 4 * SDX * SDY
+
+nin = 6 * nlut
+nout = nlut
+
+print('//placelut simple')
+print('//SBASE: %s' % (SBASE, ))
+print('//SDX: %s' % (SDX, ))
+print('//SDY: %s' % (SDX, ))
+print('//nlut: %s' % (nlut, ))
+print(
+    '''\
+module roi (
+        input wire clk,
+        input wire [%u:0] ins,
+        output wire [%u:0] outs);''') % (nin - 1, nout - 1)
+
+ini = 0
+outi = 0
+for lutx in xrange(SBASE[0], SBASE[0] + SDX):
+    for luty in xrange(SBASE[1], SBASE[1] + SDY):
+        loc = "SLICE_X%uY%u" % (lutx, luty)
+        for belc in 'ABCD':
+            bel = '%c6LUT' % belc
+            print(
+                '''\
+
+	(* KEEP, DONT_TOUCH, LOC="%s", BEL="%s" *)
+	LUT6 #(
+		.INIT(64'hBAD1DEA_1DEADCE0)
+	) %s (''') % (loc, bel, 'lut_x%uy%u_%c' % (lutx, luty, belc))
+            for i in xrange(6):
+                print('''\
+		.I%u(ins[%u]),''' % (i, ini))
+                ini += 1
+            print('''\
+		.O(outs[%u]));''') % (outi, )
+            outi += 1
+assert nin == ini
+assert nout == outi
+
+print(
+    '''
+endmodule
+
+module top(input wire clk, input wire stb, input wire di, output wire do);
+    localparam integer DIN_N = %u;
+    localparam integer DOUT_N = %u;
+
+    reg [DIN_N-1:0] din;
+    wire [DOUT_N-1:0] dout;
+
+    reg [DIN_N-1:0] din_shr;
+    reg [DOUT_N-1:0] dout_shr;
+
+    always @(posedge clk) begin
+        din_shr <= {din_shr, di};
+        dout_shr <= {dout_shr, din_shr[DIN_N-1]};
+        if (stb) begin
+            din <= din_shr;
+            dout_shr <= dout;
+        end
+    end
+
+    assign do = dout_shr[DOUT_N-1];
+    roi roi(
+            .clk(clk),
+            .ins(din),
+            .outs(dout)
+            );
+endmodule''') % (nin, nout)
--- a/fuzzers/007-timing/projects/placelut/generate.sh
+++ b/fuzzers/007-timing/projects/placelut/generate.sh
@ -0,0 +1,11 @@
+#!/bin/bash
+
+set -ex
+
+source ${XRAY_GENHEADER}
+TIMFUZ_DIR=$XRAY_DIR/fuzzers/007-timing
+
+python ../generate.py --sdx 4 --sdy 4  >top.v
+vivado -mode batch -source ../generate.tcl
+python3 $TIMFUZ_DIR/timing_txt2csv.py --speed-json $TIMFUZ_DIR/speed/build/speed.json --out timing3.csv timing3.txt
+
--- a/fuzzers/007-timing/projects/placelut/generate.tcl
+++ b/fuzzers/007-timing/projects/placelut/generate.tcl
@ -0,0 +1,47 @@
+source ../../../../../utils/utils.tcl
+source ../../project.tcl
+
+proc build_design {} {
+    create_project -force -part $::env(XRAY_PART) design design
+    read_verilog top.v
+    synth_design -top top
+
+    puts "Locking pins"
+    set_property LOCK_PINS {I0:A1 I1:A2 I2:A3 I3:A4 I4:A5 I5:A6} \
+		[get_cells -quiet -filter {REF_NAME == LUT6} -hierarchical]
+
+    puts "Package stuff"
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_00) IOSTANDARD LVCMOS33" [get_ports clk]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_01) IOSTANDARD LVCMOS33" [get_ports stb]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_02) IOSTANDARD LVCMOS33" [get_ports di]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_03) IOSTANDARD LVCMOS33" [get_ports do]
+
+    puts "pblocking"
+    create_pblock roi
+    set roipb [get_pblocks roi]
+    set_property EXCLUDE_PLACEMENT 1 $roipb
+    add_cells_to_pblock $roipb [get_cells roi]
+    resize_pblock $roipb -add "$::env(XRAY_ROI)"
+
+    puts "randplace"
+    randplace_pblock 50 roi
+
+    set_property CFGBVS VCCO [current_design]
+    set_property CONFIG_VOLTAGE 3.3 [current_design]
+    set_property BITSTREAM.GENERAL.PERFRAMECRC YES [current_design]
+
+    puts "dedicated route"
+    set_property CLOCK_DEDICATED_ROUTE FALSE [get_nets clk_IBUF]
+
+    place_design
+    route_design
+
+    write_checkpoint -force design.dcp
+    # disable combinitorial loop
+    # set_property IS_ENABLED 0 [get_drc_checks {LUTLP-1}]
+    #write_bitstream -force design.bit
+}
+
+build_design
+write_info3
+
--- a/fuzzers/007-timing/projects/placelut_fb/Makefile
+++ b/fuzzers/007-timing/projects/placelut_fb/Makefile
@ -0,0 +1,2 @@
+include ../project.mk
+
--- a/fuzzers/007-timing/projects/placelut_fb/README.md
+++ b/fuzzers/007-timing/projects/placelut_fb/README.md
@ -0,0 +1,2 @@
+LUTs are physically laid out in an array and connected to test pattern generator and fed back to other LUTs
+
--- a/fuzzers/007-timing/projects/placelut_fb/generate.py
+++ b/fuzzers/007-timing/projects/placelut_fb/generate.py
@ -0,0 +1,103 @@
+#!/usr/bin/env python
+'''
+Note: vivado will (by default) fail bitgen DRC on LUT feedback loops
+Looks like can probably be disabled, but we actually don't need a bitstream for timing analysis
+'''
+
+import argparse
+import random
+
+random.seed()
+
+parser = argparse.ArgumentParser(description='')
+parser.add_argument('--sdx', default='8', help='')
+parser.add_argument('--sdy', default='4', help='')
+args = parser.parse_args()
+'''
+Generate in pairs
+Fill up switchbox quad for now
+Create random connections between the LUTs
+See how much routing pressure we can generate
+
+Start with non-random connections to the LFSR for solver comparison
+
+Start at SLICE_X16Y102
+'''
+SBASE = (16, 102)
+SDX = int(args.sdx, 0)
+SDY = int(args.sdy, 0)
+nlut = 4 * SDX * SDY
+
+nin = 6 * nlut
+nout = nlut
+
+print('//placelut w/ feedback')
+print('//SBASE: %s' % (SBASE, ))
+print('//SDX: %s' % (SDX, ))
+print('//SDY: %s' % (SDX, ))
+print('//nlut: %s' % (nlut, ))
+print(
+    '''\
+module roi (
+        input wire clk,
+        input wire [%u:0] ins,
+        output wire [%u:0] outs);''') % (nin - 1, nout - 1)
+
+ini = 0
+outi = 0
+for lutx in xrange(SBASE[0], SBASE[0] + SDX):
+    for luty in xrange(SBASE[1], SBASE[1] + SDY):
+        loc = "SLICE_X%uY%u" % (lutx, luty)
+        for belc in 'ABCD':
+            bel = '%c6LUT' % belc
+            print(
+                '''\
+
+    (* KEEP, DONT_TOUCH, LOC="%s", BEL="%s" *)
+    LUT6 #(
+        .INIT(64'hBAD1DEA_1DEADCE0)
+    ) %s (''') % (loc, bel, 'lut_x%uy%u_%c' % (lutx, luty, belc))
+            for i in xrange(6):
+                if random.randint(0, 9) < 1:
+                    wfrom = 'ins[%u]' % ini
+                    ini += 1
+                else:
+                    wfrom = 'outs[%u]' % random.randint(0, nout - 1)
+                print('''\
+        .I%u(%s),''' % (i, wfrom))
+            print('''\
+        .O(outs[%u]));''') % (outi, )
+            outi += 1
+#assert nin == ini
+assert nout == outi
+
+print(
+    '''
+endmodule
+
+module top(input wire clk, input wire stb, input wire di, output wire do);
+    localparam integer DIN_N = %u;
+    localparam integer DOUT_N = %u;
+
+    reg [DIN_N-1:0] din;
+    wire [DOUT_N-1:0] dout;
+
+    reg [DIN_N-1:0] din_shr;
+    reg [DOUT_N-1:0] dout_shr;
+
+    always @(posedge clk) begin
+        din_shr <= {din_shr, di};
+        dout_shr <= {dout_shr, din_shr[DIN_N-1]};
+        if (stb) begin
+            din <= din_shr;
+            dout_shr <= dout;
+        end
+    end
+
+    assign do = dout_shr[DOUT_N-1];
+    roi roi(
+            .clk(clk),
+            .ins(din),
+            .outs(dout)
+            );
+endmodule''') % (nin, nout)
--- a/fuzzers/007-timing/projects/placelut_fb/generate.sh
+++ b/fuzzers/007-timing/projects/placelut_fb/generate.sh
@ -0,0 +1,9 @@
+#!/bin/bash
+
+set -ex
+source ../generate.sh
+
+python ../generate.py --sdx 4 --sdy 4  >top.v
+vivado -mode batch -source ../generate.tcl
+timing_txt2csv
+
--- a/fuzzers/007-timing/projects/placelut_fb/generate.tcl
+++ b/fuzzers/007-timing/projects/placelut_fb/generate.tcl
@ -0,0 +1,47 @@
+source ../../../../../utils/utils.tcl
+source ../../project.tcl
+
+proc build_design {} {
+    create_project -force -part $::env(XRAY_PART) design design
+    read_verilog top.v
+    synth_design -top top
+
+    puts "Locking pins"
+    set_property LOCK_PINS {I0:A1 I1:A2 I2:A3 I3:A4 I4:A5 I5:A6} \
+		[get_cells -quiet -filter {REF_NAME == LUT6} -hierarchical]
+
+    puts "Package stuff"
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_00) IOSTANDARD LVCMOS33" [get_ports clk]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_01) IOSTANDARD LVCMOS33" [get_ports stb]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_02) IOSTANDARD LVCMOS33" [get_ports di]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_03) IOSTANDARD LVCMOS33" [get_ports do]
+
+    puts "pblocking"
+    create_pblock roi
+    set roipb [get_pblocks roi]
+    set_property EXCLUDE_PLACEMENT 1 $roipb
+    add_cells_to_pblock $roipb [get_cells roi]
+    resize_pblock $roipb -add "$::env(XRAY_ROI)"
+
+    puts "randplace"
+    randplace_pblock 50 roi
+
+    set_property CFGBVS VCCO [current_design]
+    set_property CONFIG_VOLTAGE 3.3 [current_design]
+    set_property BITSTREAM.GENERAL.PERFRAMECRC YES [current_design]
+
+    puts "dedicated route"
+    set_property CLOCK_DEDICATED_ROUTE FALSE [get_nets clk_IBUF]
+
+    place_design
+    route_design
+
+    write_checkpoint -force design.dcp
+    # disable combinitorial loop
+    # set_property IS_ENABLED 0 [get_drc_checks {LUTLP-1}]
+    #write_bitstream -force design.bit
+}
+
+build_design
+write_info3
+
--- a/fuzzers/007-timing/projects/placelut_ff_fb/Makefile
+++ b/fuzzers/007-timing/projects/placelut_ff_fb/Makefile
@ -0,0 +1,2 @@
+include ../project.mk
+
--- a/fuzzers/007-timing/projects/placelut_ff_fb/README.md
+++ b/fuzzers/007-timing/projects/placelut_ff_fb/README.md
@ -0,0 +1,3 @@
+LUTs are physically laid out in an array and connected to test pattern generator and fed back to other LUTs.
+FFs are randomly inserted between connections.
+
--- a/fuzzers/007-timing/projects/placelut_ff_fb/generate.py
+++ b/fuzzers/007-timing/projects/placelut_ff_fb/generate.py
@ -0,0 +1,127 @@
+#!/usr/bin/env python
+'''
+Note: vivado will (by default) fail bitgen DRC on LUT feedback loops
+Looks like can probably be disabled, but we actually don't need a bitstream for timing analysis
+
+ERROR: [Vivado 12-2285] Cannot set LOC property of instance 'roi/lut_x22y102_D', Instance roi/lut_x22y102_D can not be placed in D6LUT of site SLICE_X18Y103 because the bel is occupied by roi/lut_x18y103_D(port:). This could be caused by bel constraint conflict
+Resolution: When using BEL constraints, ensure the BEL constraints are defined before the LOC constraints to avoid conflicts at a given site.
+
+'''
+
+import argparse
+import random
+
+random.seed()
+
+parser = argparse.ArgumentParser(description='')
+parser.add_argument('--sdx', default='8', help='')
+parser.add_argument('--sdy', default='4', help='')
+args = parser.parse_args()
+'''
+Generate in pairs
+Fill up switchbox quad for now
+Create random connections between the LUTs
+See how much routing pressure we can generate
+
+Start with non-random connections to the LFSR for solver comparison
+
+Start at SLICE_X16Y102
+'''
+SBASE = (16, 102)
+SDX = int(args.sdx, 0)
+SDY = int(args.sdy, 0)
+nlut = 4 * SDX * SDY
+
+nin = 6 * nlut
+nout = nlut
+
+print('//placelut w/ FF + feedback')
+print('//SBASE: %s' % (SBASE, ))
+print('//SDX: %s' % (SDX, ))
+print('//SDY: %s' % (SDX, ))
+print('//nlut: %s' % (nlut, ))
+print(
+    '''\
+module roi (
+        input wire clk,
+        input wire [%u:0] ins,
+        output wire [%u:0] outs);''') % (nin - 1, nout - 1)
+
+ini = 0
+outi = 0
+for lutx in xrange(SBASE[0], SBASE[0] + SDX):
+    for luty in xrange(SBASE[1], SBASE[1] + SDY):
+        loc = "SLICE_X%uY%u" % (lutx, luty)
+        for belc in 'ABCD':
+            bel = '%c6LUT' % belc
+            name = 'lut_x%uy%u_%c' % (lutx, luty, belc)
+            print(
+                '''\
+
+    (* KEEP, DONT_TOUCH, LOC="%s", BEL="%s" *)
+    LUT6 #(
+        .INIT(64'hBAD1DEA_1DEADCE0)
+    ) %s (''') % (loc, bel, name)
+            for i in xrange(6):
+                rval = random.randint(0, 9)
+                if rval < 3:
+                    wfrom = 'ins[%u]' % ini
+                    ini += 1
+                #elif rval < 6:
+                #    wfrom = 'outsr[%u]' % random.randint(0, nout - 1)
+                else:
+                    wfrom = 'outs[%u]' % random.randint(0, nout - 1)
+                print('''\
+        .I%u(%s),''' % (i, wfrom))
+            out_w = name + '_o'
+            print('''\
+        .O(%s));''') % (out_w, )
+
+            outs_w = "outs[%u]" % outi
+            if random.randint(0, 9) < 5:
+                print('    assign %s = %s;' % (outs_w, out_w))
+            else:
+                out_r = name + '_or'
+                print(
+                    '''\
+    reg %s;
+    assign %s = %s;
+    always @(posedge clk) begin
+        %s = %s;
+    end
+    ''' % (out_r, outs_w, out_r, out_r, out_w))
+            outi += 1
+
+#assert nin == ini
+assert nout == outi
+
+print(
+    '''
+endmodule
+
+module top(input wire clk, input wire stb, input wire di, output wire do);
+    localparam integer DIN_N = %u;
+    localparam integer DOUT_N = %u;
+
+    reg [DIN_N-1:0] din;
+    wire [DOUT_N-1:0] dout;
+
+    reg [DIN_N-1:0] din_shr;
+    reg [DOUT_N-1:0] dout_shr;
+
+    always @(posedge clk) begin
+        din_shr <= {din_shr, di};
+        dout_shr <= {dout_shr, din_shr[DIN_N-1]};
+        if (stb) begin
+            din <= din_shr;
+            dout_shr <= dout;
+        end
+    end
+
+    assign do = dout_shr[DOUT_N-1];
+    roi roi(
+            .clk(clk),
+            .ins(din),
+            .outs(dout)
+            );
+endmodule''') % (nin, nout)
--- a/fuzzers/007-timing/projects/placelut_ff_fb/generate.sh
+++ b/fuzzers/007-timing/projects/placelut_ff_fb/generate.sh
@ -0,0 +1,9 @@
+#!/bin/bash
+
+set -ex
+source ../generate.sh
+
+python ../generate.py --sdx 4 --sdy 4  >top.v
+vivado -mode batch -source ../generate.tcl
+timing_txt2csv
+
--- a/fuzzers/007-timing/projects/placelut_ff_fb/generate.tcl
+++ b/fuzzers/007-timing/projects/placelut_ff_fb/generate.tcl
@ -0,0 +1,47 @@
+source ../../../../../utils/utils.tcl
+source ../../project.tcl
+
+proc build_design {} {
+    create_project -force -part $::env(XRAY_PART) design design
+    read_verilog top.v
+    synth_design -top top
+
+    puts "Locking pins"
+    set_property LOCK_PINS {I0:A1 I1:A2 I2:A3 I3:A4 I4:A5 I5:A6} \
+		[get_cells -quiet -filter {REF_NAME == LUT6} -hierarchical]
+
+    puts "Package stuff"
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_00) IOSTANDARD LVCMOS33" [get_ports clk]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_01) IOSTANDARD LVCMOS33" [get_ports stb]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_02) IOSTANDARD LVCMOS33" [get_ports di]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_03) IOSTANDARD LVCMOS33" [get_ports do]
+
+    puts "pblocking"
+    create_pblock roi
+    set roipb [get_pblocks roi]
+    set_property EXCLUDE_PLACEMENT 1 $roipb
+    add_cells_to_pblock $roipb [get_cells roi]
+    resize_pblock $roipb -add "$::env(XRAY_ROI)"
+
+    puts "randplace"
+    randplace_pblock 50 roi
+
+    set_property CFGBVS VCCO [current_design]
+    set_property CONFIG_VOLTAGE 3.3 [current_design]
+    set_property BITSTREAM.GENERAL.PERFRAMECRC YES [current_design]
+
+    puts "dedicated route"
+    set_property CLOCK_DEDICATED_ROUTE FALSE [get_nets clk_IBUF]
+
+    place_design
+    route_design
+
+    write_checkpoint -force design.dcp
+    # disable combinitorial loop
+    # set_property IS_ENABLED 0 [get_drc_checks {LUTLP-1}]
+    #write_bitstream -force design.bit
+}
+
+build_design
+write_info3
+
--- a/fuzzers/007-timing/projects/project.mk
+++ b/fuzzers/007-timing/projects/project.mk
@ -0,0 +1,77 @@
+# project.mk: build specimens (run vivado), compute rref
+# corner.mk: run corner specific calculations
+
+N := 1
+SPECIMENS := $(addprefix specimen_,$(shell seq -f '%03.0f' $(N)))
+SPECIMENS_OK := $(addsuffix /OK,$(SPECIMENS))
+CSVS := $(addsuffix /timing3.csv,$(SPECIMENS))
+TIMFUZ_DIR=$(XRAY_DIR)/fuzzers/007-timing
+RREF_CORNER=slow_max
+ALLOW_ZERO_EQN?=N
+BADPRJ_OK?=N
+
+TIMGRID_VCS=build/fast_max/timgrid-vc.json build/fast_min/timgrid-vc.json build/slow_max/timgrid-vc.json build/slow_min/timgrid-vc.json
+
+all: build/timgrid-v.json
+
+# make build/checksub first
+build/fast_max/timgrid-vc.json: build/checksub
+	$(MAKE) -f $(TIMFUZ_DIR)/projects/corner.mk CORNER=fast_max
+build/fast_min/timgrid-vc.json: build/checksub
+	$(MAKE) -f $(TIMFUZ_DIR)/projects/corner.mk CORNER=fast_min
+build/slow_max/timgrid-vc.json: build/checksub
+	$(MAKE) -f $(TIMFUZ_DIR)/projects/corner.mk CORNER=slow_max
+build/slow_min/timgrid-vc.json: build/checksub
+	$(MAKE) -f $(TIMFUZ_DIR)/projects/corner.mk CORNER=slow_min
+
+$(SPECIMENS_OK):
+	bash generate.sh $(subst /OK,,$@) || (if [ "$(BADPRJ_OK)" != 'Y' ] ; then exit 1; fi; exit 0)
+	touch $@
+
+run:
+	$(MAKE) clean
+	$(MAKE) all
+	touch run.ok
+
+clean:
+	rm -rf specimen_[0-9][0-9][0-9]/ seg_clblx.segbits __pycache__ run.ok
+	rm -rf vivado*.log vivado_*.str vivado*.jou design *.bits *.dcp *.bit
+	rm -rf build
+
+.PHONY: all run clean
+
+# Normally require all projects to complete
+# If BADPRJ_OK is allowed, only take projects that were successful
+# FIXME: couldn't get call to work
+exist_csvs = \
+        for f in $(CSVS); do \
+            if [ "$(BADPRJ_OK)" != 'Y' -o -f $$f ] ; then \
+                echo $$f; \
+            fi; \
+        done
+
+# rref should be the same regardless of corner
+build/sub.json: $(SPECIMENS_OK)
+	mkdir -p build
+	# Discover which variables can be separated
+	# This is typically the longest running operation
+	\
+	    csvs=$$(for f in $(CSVS); do if [ "$(BADPRJ_OK)" != 'Y' -o -f $$f ] ; then echo $$f; fi; done) ; \
+	    python3 $(TIMFUZ_DIR)/rref.py --corner $(RREF_CORNER) --simplify --out build/sub.json.tmp $$csvs
+	mv build/sub.json.tmp build/sub.json
+
+build/grouped.csv: $(SPECIMENS_OK) build/sub.json
+	# Separate variables
+	\
+	    csvs=$$(for f in $(CSVS); do if [ "$(BADPRJ_OK)" != 'Y' -o -f $$f ] ; then echo $$f; fi; done) ; \
+	    python3 $(TIMFUZ_DIR)/csv_flat2group.py --sub-json build/sub.json --strict --out build/grouped.csv.tmp $$csvs
+	mv build/grouped.csv.tmp build/grouped.csv
+
+build/checksub: build/grouped.csv build/sub.json
+	# Verify sub.json makes a cleanly solvable solution with no non-pivot leftover
+	python3 $(TIMFUZ_DIR)/checksub.py --sub-json build/sub.json build/grouped.csv
+	touch build/checksub
+
+build/timgrid-v.json: $(TIMGRID_VCS)
+	python3 $(TIMFUZ_DIR)/timgrid_vc2v.py --out build/timgrid-v.json $(TIMGRID_VCS)
+
--- a/fuzzers/007-timing/projects/project.tcl
+++ b/fuzzers/007-timing/projects/project.tcl
@ -0,0 +1,174 @@
+proc pin_info {pin} {
+    set cell [get_cells -of_objects $pin]
+    set bel [get_bels -of_objects $cell]
+    set site [get_sites -of_objects $bel]
+    return "$site $bel"
+}
+
+proc pin_bel {pin} {
+    set cell [get_cells -of_objects $pin]
+    set bel [get_bels -of_objects $cell]
+    return $bel
+}
+
+# Changed to group wires and nodes
+# This allows tracing the full path along with pips
+proc write_info3 {} {
+    set outdir "."
+    set fp [open "$outdir/timing3.txt" w]
+    # bel as site/bel, so don't bother with site
+    puts $fp "net src_bel dst_bel ico fast_max fast_min slow_max slow_min pips inodes wires"
+
+    set TIME_start [clock clicks -milliseconds]
+    set verbose 0
+    set equations 0
+    set site_src_nets 0
+    set site_dst_nets 0
+    set neti 0
+    set nets [get_nets -hierarchical]
+    set nnets [llength $nets]
+    foreach net $nets {
+        incr neti
+        #if {$neti >= 10} {
+        #    puts "Debug break"
+        #    break
+        #}
+
+        puts "Net $neti / $nnets: $net"
+        # The semantics of get_pins -leaf is kind of odd
+        # When no passthrough LUTs exist, it has no effect
+        # When passthrough LUT present:
+        # -w/o -leaf: some pins + passthrough LUT pins
+        # -w/ -leaf: different pins + passthrough LUT pins
+        # With OUT filter this seems to be sufficient
+        set src_pin [get_pins -leaf -filter {DIRECTION == OUT} -of_objects $net]
+        set src_bel [pin_bel $src_pin]
+        set src_site [get_sites -of_objects $src_bel]
+        # Only one net driver
+        set src_site_pins [get_site_pins -filter {DIRECTION == OUT} -of_objects $net]
+
+        # Sometimes this is empty for reasons that escape me
+        # Emitting direction doesn't help
+        if {[llength $src_site_pins] < 1} {
+            if $verbose {
+                puts "    Ignoring site internal net"
+            }
+            incr site_src_nets
+            continue
+        }
+        set dst_site_pins_net [get_site_pins -filter {DIRECTION == IN} -of_objects $net]
+        if {[llength $dst_site_pins_net] < 1} {
+            puts "  Skipping site internal source net"
+            incr site_dst_nets
+            continue
+        }
+        foreach src_site_pin $src_site_pins {
+            if $verbose {
+                puts "Source: $src_pin at site $src_site:$src_bel, spin $src_site_pin"
+            }
+
+            # Run with and without interconnect only
+            foreach ico "0 1" {
+                set ico_flag ""
+                if $ico {
+                    set ico_flag "-interconnect_only"
+                    set delays [get_net_delays $ico_flag -of_objects $net]
+                } else {
+                    set delays [get_net_delays -of_objects $net]
+                }
+                foreach delay $delays {
+                    set delaystr [get_property NAME $delay]
+                    set dst_pins [get_property TO_PIN $delay]
+                    set dst_pin [get_pins $dst_pins]
+                    #puts "  $delaystr: $src_pin => $dst_pin"
+                    set dst_bel [pin_bel $dst_pin]
+                    set dst_site [get_sites -of_objects $dst_bel]
+                    if $verbose {
+                        puts "  Dest: $dst_pin at site $dst_site:$dst_bel"
+                    }
+
+                    set dst_site_pins [get_site_pins -of_objects $dst_pin]
+                    # Some nets are internal
+                    # But should this happen on dest if we've already filtered source?
+                    if {"$dst_site_pins" eq ""} {
+                        continue
+                    }
+                    # Also apparantly you can have multiple of these as well
+                    foreach dst_site_pin $dst_site_pins {
+                        set fast_max [get_property "FAST_MAX" $delay]
+                        set fast_min [get_property "FAST_MIN" $delay]
+                        set slow_max [get_property "SLOW_MAX" $delay]
+                        set slow_min [get_property "SLOW_MIN" $delay]
+
+                        # Want:
+                        # Site / BEL at src
+                        # Site / BEL at dst
+                        # Pips in between
+                        # Walk net, looking for interesting elements in between
+                        set pips [get_pips -of_objects $net -from $src_site_pin -to $dst_site_pin]
+                        if $verbose {
+                            foreach pip $pips {
+                                puts "    PIP $pip"
+                            }
+                        }
+                        set nodes [get_nodes -of_objects $net -from $src_site_pin -to $dst_site_pin]
+                        #set wires [get_wires -of_objects $net -from $src_site_pin -to $dst_site_pin]
+                        set wires [get_wires -of_objects $nodes]
+
+                        # puts $fp "$net $src_bel $dst_bel $ico $fast_max $fast_min $slow_max $slow_min $pips"
+                        puts -nonewline $fp "$net $src_bel $dst_bel $ico $fast_max $fast_min $slow_max $slow_min"
+        
+                        # Write pips w/ speed index
+                        puts -nonewline $fp " "
+                        set needspace 0
+                        foreach pip $pips {
+                            if $needspace {
+                                puts -nonewline $fp "|"
+                            }
+                            set speed_index [get_property SPEED_INDEX $pip]
+                            puts -nonewline $fp "$pip:$speed_index"
+                            set needspace 1
+                        }
+
+                        # Write nodes
+                        #set nodes_str [string map {" " "|"} $nodes]
+                        #puts -nonewline $fp " $nodes_str"
+                        puts -nonewline $fp " "
+                        set needspace 0
+                        foreach node $nodes {
+                            if $needspace {
+                                puts -nonewline $fp "|"
+                            }
+                            set nwires [llength [get_wires -of_objects $node]]
+                            puts -nonewline $fp "$node:$nwires"
+                            set needspace 1
+                        }
+
+                        # Write wires
+                        puts -nonewline $fp " "
+                        set needspace 0
+                        foreach wire $wires {
+                            if $needspace {
+                                puts -nonewline $fp "|"
+                            }
+                            set speed_index [get_property SPEED_INDEX $wire]
+                            puts -nonewline $fp "$wire:$speed_index"
+                            set needspace 1
+                        }
+
+                        puts $fp ""
+
+                        incr equations
+                        break
+                    }
+                }
+            }
+        }
+    }
+    close $fp
+    set TIME_taken [expr [clock clicks -milliseconds] - $TIME_start]
+    puts "Took ms: $TIME_taken"
+    puts "Generated $equations equations"
+    puts "Skipped $site_src_nets (+ $site_dst_nets) site nets"
+}
+
--- a/fuzzers/007-timing/rref.py
+++ b/fuzzers/007-timing/rref.py
@ -0,0 +1,237 @@
+#!/usr/bin/env python3
+
+from timfuz import Benchmark, Ar_di2np, loadc_Ads_b, index_names, A_ds2np, simplify_rows
+import numpy as np
+import glob
+import math
+import json
+import sympy
+from collections import OrderedDict
+from fractions import Fraction
+
+
+def fracr_quick(r):
+    return [Fraction(numerator=int(x), denominator=1) for x in r]
+
+
+def fracm_quick(m):
+    '''Convert integer matrix to Fraction matrix'''
+    t = type(m[0][0])
+    print('fracm_quick type: %s' % t)
+    return [fracr_quick(r) for r in m]
+
+
+class State(object):
+    def __init__(self, Ads, drop_names=[]):
+        self.Ads = Ads
+        self.names = index_names(self.Ads)
+
+        # known zero delay elements
+        self.drop_names = set(drop_names)
+        # active names in rows
+        # includes sub variables, excludes variables that have been substituted out
+        self.base_names = set(self.names)
+        self.names = set(self.base_names)
+        # List of variable substitutions
+        # k => dict of v:n entries that it came from
+        self.subs = {}
+        self.verbose = True
+
+    def print_stats(self):
+        print("Stats")
+        print("  Substitutions: %u" % len(self.subs))
+        if self.subs:
+            print(
+                "    Largest: %u" % max([len(x) for x in self.subs.values()]))
+        print("  Rows: %u" % len(self.Ads))
+        print(
+            "  Cols (in): %u" % (len(self.base_names) + len(self.drop_names)))
+        print("  Cols (preprocessed): %u" % len(self.base_names))
+        print("    Drop names: %u" % len(self.drop_names))
+        print("  Cols (out): %u" % len(self.names))
+        print("  Solvable vars: %u" % len(self.names & self.base_names))
+        assert len(self.names) >= len(self.subs)
+
+    @staticmethod
+    def load(fn_ins, simplify=False, corner=None):
+        Ads, b = loadc_Ads_b(fn_ins, corner=corner, ico=True)
+        if simplify:
+            print('Simplifying corner %s' % (corner, ))
+            Ads, b = simplify_rows(Ads, b, remove_zd=False, corner=corner)
+        return State(Ads)
+
+
+def write_state(state, fout):
+    j = {
+        'names': dict([(x, None) for x in state.names]),
+        'drop_names': list(state.drop_names),
+        'base_names': list(state.base_names),
+        'subs': dict([(name, values) for name, values in state.subs.items()]),
+        'pivots': state.pivots,
+    }
+    json.dump(j, fout, sort_keys=True, indent=4, separators=(',', ': '))
+
+
+def Anp2matrix(Anp):
+    '''
+    Original idea was to make into a square matrix
+    but this loses too much information
+    so now this actually isn't doing anything and should probably be eliminated
+    '''
+
+    ncols = len(Anp[0])
+    A_ub2 = [np.zeros(ncols) for _i in range(ncols)]
+    dst_rowi = 0
+    for rownp in Anp:
+        A_ub2[dst_rowi] = np.add(A_ub2[dst_rowi], rownp)
+        dst_rowi = (dst_rowi + 1) % ncols
+    return A_ub2
+
+
+def row_np2ds(rownp, names):
+    ret = {}
+    assert len(rownp) == len(names), (len(rownp), len(names))
+    for namei, name in enumerate(names):
+        v = rownp[namei]
+        if v:
+            ret[name] = v
+    return ret
+
+
+def row_sym2dsf(rowsym, names):
+    '''Convert a sympy row into a dictionary of keys to (numerator, denominator) tuples'''
+    from sympy import fraction
+
+    ret = {}
+    assert len(rowsym) == len(names), (len(rowsym), len(names))
+    for namei, name in enumerate(names):
+        v = rowsym[namei]
+        if v:
+            (num, den) = fraction(v)
+            ret[name] = (int(num), int(den))
+    return ret
+
+
+def state_rref(state, verbose=False):
+    print('Converting rows to integer keys')
+    names, Anp = A_ds2np(state.Ads)
+
+    print('np: %u rows x %u cols' % (len(Anp), len(Anp[0])))
+    if 0:
+        print('Combining rows into matrix')
+        mnp = Anp2matrix(Anp)
+    else:
+        mnp = Anp
+    print('Matrix: %u rows x %u cols' % (len(mnp), len(mnp[0])))
+    print('Converting np to sympy matrix')
+    mfrac = fracm_quick(mnp)
+    # doesn't seem to change anything
+    #msym = sympy.MutableSparseMatrix(mfrac)
+    msym = sympy.Matrix(mfrac)
+    # internal encoding has significnat performance implications
+    #assert type(msym[0]) is sympy.Integer
+
+    if verbose:
+        print('names')
+        print(names)
+        print('Matrix')
+        sympy.pprint(msym)
+    print('Making rref')
+    rref, pivots = msym.rref(normalize_last=False)
+
+    if verbose:
+        print('Pivots')
+        sympy.pprint(pivots)
+        print('rref')
+        sympy.pprint(rref)
+
+    state.pivots = {}
+
+    def row_solved(rowsym, row_pivot):
+        for ci, c in enumerate(rowsym):
+            if ci == row_pivot:
+                continue
+            if c != 0:
+                return False
+        return True
+
+    #rrefnp = np.array(rref).astype(np.float64)
+    #print('Computing groups w/ rref %u row x %u col' % (len(rrefnp), len(rrefnp[0])))
+    #print(rrefnp)
+    # rows that have a single 1 are okay
+    # anything else requires substitution (unless all 0)
+    # pivots may be fewer than the rows
+    # remaining rows should be 0s
+    for row_i, row_pivot in enumerate(pivots):
+        rowsym = rref.row(row_i)
+        # yipee! nothign to report
+        if row_solved(rowsym, row_pivot):
+            continue
+
+        # a grouping
+        group_name = "GRP_%u" % row_i
+        rowdsf = row_sym2dsf(rowsym, names)
+
+        state.subs[group_name] = rowdsf
+        # Add the new variables
+        state.names.add(group_name)
+        # Remove substituted variables
+        # Note: variables may appear multiple times
+        state.names.difference_update(set(rowdsf.keys()))
+        pivot_name = names[row_pivot]
+        state.pivots[group_name] = pivot_name
+        if verbose:
+            print("%s (%s): %s" % (group_name, pivot_name, rowdsf))
+
+    return state
+
+
+def run(fnout, fn_ins, simplify=False, corner=None, verbose=0):
+    print('Loading data')
+
+    assert len(fn_ins) > 0
+    state = State.load(fn_ins, simplify=simplify, corner=corner)
+    state_rref(state, verbose=verbose)
+    state.print_stats()
+    if fnout:
+        with open(fnout, 'w') as fout:
+            write_state(state, fout)
+
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description=
+        'Compute reduced row echelon (RREF) to form sub.json (variable groups)'
+    )
+
+    parser.add_argument('--verbose', action='store_true', help='')
+    parser.add_argument('--simplify', action='store_true', help='')
+    parser.add_argument('--corner', default="slow_max", help='')
+    parser.add_argument(
+        '--speed-json',
+        default='build_speed/speed.json',
+        help='Provides speed index to name translation')
+    parser.add_argument('--out', help='Output sub.json substitution result')
+    parser.add_argument('fns_in', nargs='*', help='timing3.csv input files')
+    args = parser.parse_args()
+    bench = Benchmark()
+
+    fns_in = args.fns_in
+    if not fns_in:
+        fns_in = glob.glob('specimen_*/timing3.csv')
+
+    try:
+        run(
+            fnout=args.out,
+            fn_ins=fns_in,
+            simplify=args.simplify,
+            corner=args.corner,
+            verbose=args.verbose)
+    finally:
+        print('Exiting after %s' % bench)
+
+
+if __name__ == '__main__':
+    main()
--- a/fuzzers/007-timing/solve_leastsq.py
+++ b/fuzzers/007-timing/solve_leastsq.py
@ -0,0 +1,190 @@
+#!/usr/bin/env python3
+
+from timfuz import Benchmark, load_sub, corner_s2i, acorner2csv
+import timfuz
+import numpy as np
+import math
+import sys
+import os
+import time
+import timfuz_solve
+import scipy.optimize as optimize
+
+
+def mkestimate(Anp, b):
+    '''
+    Ballpark upper bound estimate assuming variables contribute all of the delay in their respective row
+    Return the min of all of the occurances
+
+    XXX: should this be corner adjusted?
+    '''
+    cols = len(Anp[0])
+    x0 = np.array([1e3 for _x in range(cols)])
+    for row_np, row_b in zip(Anp, b):
+        for coli, val in enumerate(row_np):
+            if val:
+                # Scale by number occurances
+                ub = row_b / val
+                if ub >= 0:
+                    x0[coli] = min(x0[coli], ub)
+    return x0
+
+
+def save(outfn, xvals, names, corner):
+    # ballpark minimum actual observed delay is around 7 (carry chain)
+    # anything less than one is probably a solver artifact
+    delta = 0.5
+    corneri = corner_s2i[corner]
+
+    # Round conservatively
+    roundf = {
+        'fast_max': math.ceil,
+        'fast_min': math.floor,
+        'slow_max': math.ceil,
+        'slow_min': math.floor,
+    }[corner]
+
+    print('Writing results')
+    skips = 0
+    keeps = 0
+    with open(outfn, 'w') as fout:
+        # write as one variable per line
+        # this natively forms a bound if fed into linprog solver
+        fout.write('ico,fast_max fast_min slow_max slow_min,rows...\n')
+        for xval, name in zip(xvals, names):
+            row_ico = 1
+
+            # also review ceil vs floor choice for min vs max
+            # lets be more conservative for now
+            if xval < delta:
+                #print('Skipping %s: %0.6f' % (name, xval))
+                skips += 1
+                continue
+            keeps += 1
+            #xvali = round(xval)
+
+            items = [str(row_ico), acorner2csv(roundf(xval), corneri)]
+            items.append('%u %s' % (1, name))
+            fout.write(','.join(items) + '\n')
+    print(
+        'Wrote: skip %u => %u / %u valid delays' % (skips, keeps, len(names)))
+    assert keeps, 'Failed to estimate delay'
+
+
+def run_corner(
+        Anp, b, names, corner, verbose=False, opts={}, meta={}, outfn=None):
+    # Given timing scores for above delays (-ps)
+    assert type(Anp[0]) is np.ndarray, type(Anp[0])
+    assert type(b) is np.ndarray, type(b)
+
+    #check_feasible(Anp, b)
+    '''
+    Be mindful of signs
+    Have something like
+    timing1/timing 2 are constants
+    delay1 +   delay2 +               delay4     >= timing1
+               delay2 +   delay3                 >= timing2
+
+    But need it in compliant form:
+    -delay1 +   -delay2 +               -delay4     <= -timing1
+                -delay2 +   -delay3                 <= -timing2
+    '''
+    rows = len(Anp)
+    cols = len(Anp[0])
+
+    print('Unique delay elements: %d' % len(names))
+    print('Input paths')
+    print('  # timing scores: %d' % len(b))
+    print('  Rows: %d' % rows)
+    '''
+    You must have at least as many things to optimize as variables
+    That is, the system must be plausibly constrained for it to attempt a solve
+    If not, you'll get a message like
+    TypeError: Improper input: N=3 must not exceed M=2
+    '''
+    if rows < cols:
+        raise Exception("rows must be >= cols")
+
+    tlast = [None]
+    iters = [0]
+    printn = [0]
+
+    def progress_print():
+        iters[0] += 1
+        if tlast[0] is None:
+            tlast[0] = time.time()
+        if time.time() - tlast[0] > 1.0:
+            sys.stdout.write('I:%d ' % iters[0])
+            tlast[0] = time.time()
+            printn[0] += 1
+            if printn[0] % 10 == 0:
+                sys.stdout.write('\n')
+            sys.stdout.flush()
+
+    def func(params):
+        progress_print()
+        return (b - np.dot(Anp, params))
+
+    print('')
+    # Now find smallest values for delay constants
+    # Due to input bounds (ex: column limit), some delay elements may get eliminated entirely
+    print('Running leastsq w/ %d r, %d c (%d name)' % (rows, cols, len(names)))
+    # starting at 0 completes quicky, but gives a solution near 0 with terrible results
+    # maybe give a starting estimate to the smallest net delay with the indicated variable
+    #x0 = np.array([1000.0 for _x in range(cols)])
+    print('Creating x0 estimate')
+    x0 = mkestimate(Anp, b)
+
+    print('Solving')
+    res = optimize.least_squares(func, x0, bounds=(0, float('inf')))
+    print('Done')
+
+    if outfn:
+        save(outfn, res.x, names, corner)
+
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description=
+        'Solve timing solution using least squares objective function')
+
+    parser.add_argument('--verbose', action='store_true', help='')
+    parser.add_argument(
+        '--massage',
+        action='store_true',
+        help='Derive additional constraints to improve solution')
+    parser.add_argument(
+        '--sub-json', help='Group substitutions to make fully ranked')
+    parser.add_argument('--corner', required=True, default="slow_max", help='')
+    parser.add_argument(
+        '--out', default=None, help='output timing delay .json')
+    parser.add_argument('fns_in', nargs='+', help='timing3.csv input files')
+    args = parser.parse_args()
+    # Store options in dict to ease passing through functions
+    bench = Benchmark()
+
+    fns_in = args.fns_in
+    if not fns_in:
+        fns_in = glob.glob('specimen_*/timing3.csv')
+
+    sub_json = None
+    if args.sub_json:
+        sub_json = load_sub(args.sub_json)
+
+    try:
+        timfuz_solve.run(
+            run_corner=run_corner,
+            sub_json=sub_json,
+            fns_in=fns_in,
+            corner=args.corner,
+            massage=args.massage,
+            outfn=args.out,
+            verbose=args.verbose)
+    finally:
+        print('Exiting after %s' % bench)
+
+
+if __name__ == '__main__':
+    main()
--- a/fuzzers/007-timing/solve_linprog.py
+++ b/fuzzers/007-timing/solve_linprog.py
@ -0,0 +1,229 @@
+#!/usr/bin/env python3
+
+import scipy.optimize as optimize
+from timfuz import Benchmark, load_sub, A_ub_np2d, acorner2csv, corner_s2i
+import numpy as np
+import glob
+import json
+import math
+import sys
+import os
+import time
+import timfuz_solve
+
+
+def save(outfn, xvals, names, corner):
+    # ballpark minimum actual observed delay is around 7 (carry chain)
+    # anything less than one is probably a solver artifact
+    delta = 0.5
+    corneri = corner_s2i[corner]
+
+    roundf = {
+        'fast_max': math.ceil,
+        'fast_min': math.floor,
+        'slow_max': math.ceil,
+        'slow_min': math.floor,
+    }[corner]
+
+    print('Writing results')
+    zeros = 0
+    with open(outfn, 'w') as fout:
+        # write as one variable per line
+        # this natively forms a bound if fed into linprog solver
+        fout.write('ico,fast_max fast_min slow_max slow_min,rows...\n')
+        for xval, name in zip(xvals, names):
+            row_ico = 1
+
+            # FIXME: only report for the given corner?
+            # also review ceil vs floor choice for min vs max
+            # lets be more conservative for now
+            if xval < delta:
+                print('WARNING: near 0 delay on %s: %0.6f' % (name, xval))
+                zeros += 1
+                #continue
+            items = [str(row_ico), acorner2csv(roundf(xval), corneri)]
+            items.append('%u %s' % (1, name))
+            fout.write(','.join(items) + '\n')
+    nonzeros = len(names) - zeros
+    print(
+        'Wrote: %u / %u constrained delays, %u zeros' %
+        (nonzeros, len(names), zeros))
+
+
+def run_corner(
+        Anp, b, names, corner, verbose=False, opts={}, meta={}, outfn=None):
+    if len(Anp) == 0:
+        print('WARNING: zero equations')
+        if outfn:
+            save(outfn, [], [], corner)
+        return
+    maxcorner = {
+        'slow_max': True,
+        'slow_min': False,
+        'fast_max': True,
+        'fast_min': False,
+    }[corner]
+
+    # Given timing scores for above delays (-ps)
+    assert type(Anp[0]) is np.ndarray, type(Anp[0])
+    assert type(b) is np.ndarray, type(b)
+
+    #check_feasible(Anp, b)
+    '''
+    Be mindful of signs
+    t1, t2: total delay contants
+    d1, d2..: variables to solve for
+
+    Max corner intuitive form:
+    d1 + d2 +     d4 >= t1
+         d2 + d3     >= t2
+
+    But need it in compliant form:
+    -d1 + -d2 +      -d4 <= -t1
+          -d2 + -d3      <= -t2
+    Minimize delay elements
+
+
+    Min corner intuitive form:
+    d1 + d2 +     d4 <= t1
+         d2 + d3     <= t2
+    Maximize delay elements
+    '''
+
+    rows = len(Anp)
+    cols = len(Anp[0])
+    if maxcorner:
+        print('maxcorner => scaling to solution form...')
+        b_ub = -1.0 * b
+        #A_ub = -1.0 * Anp
+        A_ub = [-1.0 * x for x in Anp]
+    else:
+        print('mincorner => no scaling required')
+        b_ub = b
+        A_ub = Anp
+
+    print('Creating misc constants...')
+    # Minimization function scalars
+    # Treat all logic elements as equally important
+    if maxcorner:
+        # Best result are min delays
+        c = [1 for _i in range(len(names))]
+    else:
+        # Best result are max delays
+        c = [-1 for _i in range(len(names))]
+    # Delays cannot be negative
+    # (this is also the default constraint)
+    #bounds =  [(0, None) for _i in range(len(names))]
+    # Also you can provide one to apply to all
+    bounds = (0, None)
+
+    # Seems to take about rows + 3 iterations
+    # Give some margin
+    #maxiter = int(1.1 * rows + 100)
+    #maxiter = max(1000, int(1000 * rows + 1000))
+    # Most of the time I want it to just keep going unless I ^C it
+    maxiter = 1000000
+
+    if verbose >= 2:
+        print('b_ub', b)
+    print('Unique delay elements: %d' % len(names))
+    print('  # delay minimization weights: %d' % len(c))
+    print('  # delay constraints: %d' % len(bounds))
+    print('Input paths')
+    print('  # timing scores: %d' % len(b))
+    print('  Rows: %d' % rows)
+
+    tlast = [time.time()]
+    iters = [0]
+    printn = [0]
+
+    def callback(xk, **kwargs):
+        iters[0] = kwargs['nit']
+        if time.time() - tlast[0] > 1.0:
+            sys.stdout.write('I:%d ' % kwargs['nit'])
+            tlast[0] = time.time()
+            printn[0] += 1
+            if printn[0] % 10 == 0:
+                sys.stdout.write('\n')
+            sys.stdout.flush()
+
+    print('')
+    # Now find smallest values for delay constants
+    # Due to input bounds (ex: column limit), some delay elements may get eliminated entirely
+    # https://docs.scipy.org/doc/scipy-0.18.1/reference/generated/scipy.optimize.linprog.html
+    print('Running linprog w/ %d r, %d c (%d name)' % (rows, cols, len(names)))
+    res = optimize.linprog(
+        c,
+        A_ub=A_ub,
+        b_ub=b_ub,
+        bounds=bounds,
+        callback=callback,
+        options={
+            "disp": True,
+            'maxiter': maxiter,
+            'bland': True,
+            'tol': 1e-6,
+        })
+    nonzeros = 0
+    print('Ran %d iters' % iters[0])
+    if res.success:
+        print('Result sample (%d elements)' % (len(res.x)))
+        plim = 3
+        for xi, (name, x) in enumerate(zip(names, res.x)):
+            nonzero = x >= 0.001
+            if nonzero:
+                nonzeros += 1
+            #if nonzero and (verbose >= 1 or xi > 30):
+            if nonzero and (verbose or (
+                (nonzeros < 100 or nonzeros % 20 == 0) and nonzeros <= plim)):
+                print('  % 4u % -80s % 10.1f' % (xi, name, x))
+        print('Delay on %d / %d' % (nonzeros, len(res.x)))
+
+        if outfn:
+            save(outfn, res.x, names, corner)
+
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description=
+        'Solve timing solution using linear programming inequalities')
+
+    parser.add_argument('--verbose', action='store_true', help='')
+    parser.add_argument('--massage', action='store_true', help='')
+    parser.add_argument('--sub-csv', help='')
+    parser.add_argument(
+        '--sub-json', help='Group substitutions to make fully ranked')
+    parser.add_argument('--corner', required=True, default="slow_max", help='')
+    parser.add_argument(
+        '--out', default=None, help='output timing delay .json')
+    parser.add_argument('fns_in', nargs='+', help='timing3.csv input files')
+    args = parser.parse_args()
+    # Store options in dict to ease passing through functions
+    bench = Benchmark()
+
+    fns_in = args.fns_in
+    if not fns_in:
+        fns_in = glob.glob('specimen_*/timing3.csv')
+
+    sub_json = None
+    if args.sub_json:
+        sub_json = load_sub(args.sub_json)
+
+    try:
+        timfuz_solve.run(
+            run_corner=run_corner,
+            sub_json=sub_json,
+            sub_csv=args.sub_csv,
+            fns_in=fns_in,
+            corner=args.corner,
+            massage=args.massage,
+            outfn=args.out,
+            verbose=args.verbose)
+    finally:
+        print('Exiting after %s' % bench)
+
+
+if __name__ == '__main__':
+    main()
--- a/fuzzers/007-timing/speed/.gitignore
+++ b/fuzzers/007-timing/speed/.gitignore
@ -0,0 +1,2 @@
+build
+
--- a/fuzzers/007-timing/speed/Makefile
+++ b/fuzzers/007-timing/speed/Makefile
@ -0,0 +1,12 @@
+all: build/speed.json
+
+build/node.txt: speed_json.py generate.tcl
+	mkdir -p build
+	cd build && vivado -mode batch -source ../generate.tcl
+
+build/speed.json: build/node.txt
+	cd build && python ../speed_json.py speed_model.txt node.txt speed.json
+
+clean:
+	rm -rf build
+
--- a/fuzzers/007-timing/speed/README.md
+++ b/fuzzers/007-timing/speed/README.md
@ -0,0 +1,4 @@
+Generates speed.json, describing speed index to string translation.
+These indices appear to be fixed between runtimes.
+It is unknown how stable they are across versions.
+
--- a/fuzzers/007-timing/speed/generate.tcl
+++ b/fuzzers/007-timing/speed/generate.tcl
@ -0,0 +1,173 @@
+proc pin_info {pin} {
+    set cell [get_cells -of_objects $pin]
+    set bel [get_bels -of_objects $cell]
+    set site [get_sites -of_objects $bel]
+    return "$site $bel"
+}
+
+proc pin_bel {pin} {
+    set cell [get_cells -of_objects $pin]
+    set bel [get_bels -of_objects $cell]
+    return $bel
+}
+
+proc build_design_full {} {
+    create_project -force -part $::env(XRAY_PART) design design
+    read_verilog ../top.v
+    read_verilog ../../src/picorv32.v
+    synth_design -top top
+
+    #set_property LOCK_PINS {I0:A1 I1:A2 I2:A3 I3:A4 I4:A5 I5:A6} \
+	#	[get_cells -quiet -filter {REF_NAME == LUT6} -hierarchical]
+
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_00) IOSTANDARD LVCMOS33" [get_ports clk]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_01) IOSTANDARD LVCMOS33" [get_ports stb]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_02) IOSTANDARD LVCMOS33" [get_ports di]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_03) IOSTANDARD LVCMOS33" [get_ports do]
+
+    set_property CFGBVS VCCO [current_design]
+    set_property CONFIG_VOLTAGE 3.3 [current_design]
+    set_property BITSTREAM.GENERAL.PERFRAMECRC YES [current_design]
+
+    set_property CLOCK_DEDICATED_ROUTE FALSE [get_nets clk_IBUF]
+
+    place_design
+    route_design
+
+    write_checkpoint -force design.dcp
+    write_bitstream -force design.bit
+}
+
+
+proc build_design_synth {} {
+    create_project -force -part $::env(XRAY_PART) design design
+
+    read_verilog ../top.v
+    read_verilog ../picorv32.v
+    synth_design -top top
+}
+
+# WARNING: [Common 17-673] Cannot get value of property 'FORWARD' because this property is not valid in conjunction with other property setting on this object.
+# WARNING: [Common 17-673] Cannot get value of property 'REVERSE' because this property is not valid in conjunction with other property setting on this object.
+proc speed_models1 {} {
+    set outdir "."
+    set fp [open "$outdir/speed_model.txt" w]
+    # list_property [lindex [get_speed_models] 0]
+    set speed_models [get_speed_models]
+    set properties [list_property [lindex $speed_models 0]]
+    # "CLASS DELAY FAST_MAX FAST_MIN IS_INSTANCE_SPECIFIC NAME NAME_LOGICAL SLOW_MAX SLOW_MIN SPEED_INDEX TYPE"
+    puts $fp $properties
+
+    set needspace 0
+    foreach speed_model $speed_models {
+        foreach property $properties {
+            if $needspace {
+                puts -nonewline $fp " "
+            }
+            puts -nonewline $fp [get_property $property $speed_model]
+            set needspace 1
+        }
+        puts $fp ""
+    }
+    close $fp
+}
+
+proc speed_models2 {} {
+    set outdir "."
+    set fp [open "$outdir/speed_model.txt" w]
+    # list_property [lindex [get_speed_models] 0]
+    set speed_models [get_speed_models]
+    puts "Items: [llength $speed_models]"
+
+    set needspace 0
+    # Not all objects have the same properties
+    # But they do seem to have the same list
+    set properties [list_property [lindex $speed_models 0]]
+    foreach speed_model $speed_models {
+        set needspace 0
+        foreach property $properties {
+            set val [get_property $property $speed_model]
+            if {"$val" ne ""} {
+                if $needspace {
+                    puts -nonewline $fp " "
+                }
+                puts -nonewline $fp "$property:$val"
+                set needspace 1
+            }
+        }
+        puts $fp ""
+    }
+    close $fp
+}
+
+
+# For cost codes
+# Items: 2663055s
+# Hmm too much
+# Lets filter out items we really want
+proc nodes_all {} {
+    set outdir "."
+    set fp [open "$outdir/node_all.txt" w]
+    set items [get_nodes]
+    puts "Items: [llength $items]"
+
+    set needspace 0
+    set properties [list_property [lindex $items 0]]
+    foreach item $items {
+        set needspace 0
+        foreach property $properties {
+            set val [get_property $property $item]
+            if {"$val" ne ""} {
+                if $needspace {
+                    puts -nonewline $fp " "
+                }
+                puts -nonewline $fp "$property:$val"
+                set needspace 1
+            }
+        }
+        puts $fp ""
+    }
+    close $fp
+}
+
+# Only writes out items with unique cost codes
+# (much faster)
+proc nodes_unique_cc {} {
+    set outdir "."
+    set fp [open "$outdir/node.txt" w]
+    set items [get_nodes]
+    puts "Computing cost codes with [llength $items] items"
+
+    set needspace 0
+    set properties [list_property [lindex $items 0]]
+    set cost_codes_known [dict create]
+    set itemi 0
+    foreach item $items {
+        incr itemi
+        set cost_code [get_property COST_CODE $item]
+        if {[ dict exists $cost_codes_known $cost_code ]} {
+            continue
+        }
+        puts "  Adding cost code $cost_code @ item $itemi"
+        dict set cost_codes_known $cost_code 1
+
+        set needspace 0
+        foreach property $properties {
+            set val [get_property $property $item]
+            if {"$val" ne ""} {
+                if $needspace {
+                    puts -nonewline $fp " "
+                }
+                puts -nonewline $fp "$property:$val"
+                set needspace 1
+            }
+        }
+        puts $fp ""
+    }
+    close $fp
+}
+
+build_design_full
+speed_models2
+nodes_unique_cc
+
--- a/fuzzers/007-timing/speed/speed_json.py
+++ b/fuzzers/007-timing/speed/speed_json.py
@ -0,0 +1,94 @@
+import json
+
+
+def load_speed(fin):
+    speed_models = {}
+    speed_types = {}
+    for l in fin:
+        delay = {}
+        l = l.strip()
+        for kvs in l.split():
+            name, value = kvs.split(':')
+            name = name.lower()
+            if name in ('class', ):
+                continue
+            if name in ('speed_index', ):
+                value = int(value)
+            if name == 'type':
+                speed_types.setdefault(value, {})
+            delay[name] = value
+        delayk = delay['name']
+        if delayk in delay:
+            raise Exception("Duplicate name")
+
+        if "name" in delay and "name_logical" in delay:
+            # Always true
+            if delay['name'] != delay['name_logical']:
+                raise Exception("nope!")
+            # Found a counter example
+            if 0 and delay['name'] != delay['forward']:
+                # ('BSW_NONTLFW_TLRV', '_BSW_LONG_NONTLFORWARD')
+                print(delay['name'], delay['forward'])
+                raise Exception("nope!")
+            # Found a counter example
+            if 0 and delay['forward'] != delay['reverse']:
+                # _BSW_LONG_NONTLFORWARD _BSW_LONG_TLREVERSE
+                print(delay['forward'], delay['reverse'])
+                raise Exception("nope!")
+
+        speed_models[delayk] = delay
+    return speed_models, speed_types
+
+
+def load_cost_code(fin):
+    # COST_CODE:4 COST_CODE_NAME:SLOWSINGLE
+    cost_codes = {}
+    for l in fin:
+        lj = {}
+        l = l.strip()
+        for kvs in l.split():
+            name, value = kvs.split(':')
+            name = name.lower()
+            lj[name] = value
+        cost_code = {
+            'name': lj['cost_code_name'],
+            'code': int(lj['cost_code']),
+            # Hmm is this unique per type?
+            #'speed_class': int(lj['speed_class']),
+        }
+
+        cost_codes[cost_code['name']] = cost_code
+    return cost_codes
+
+
+def run(speed_fin, node_fin, fout, verbose=0):
+    print('Loading data')
+    speed_models, speed_types = load_speed(speed_fin)
+    cost_codes = load_cost_code(node_fin)
+
+    j = {
+        'speed_model': speed_models,
+        'speed_type': speed_types,
+        'cost_code': cost_codes,
+    }
+    json.dump(j, fout, sort_keys=True, indent=4, separators=(',', ': '))
+
+
+if __name__ == '__main__':
+    import argparse
+
+    parser = argparse.ArgumentParser(description='Timing fuzzer')
+
+    parser.add_argument('--verbose', type=int, help='')
+    parser.add_argument(
+        'speed_fn_in', default='/dev/stdin', nargs='?', help='Input file')
+    parser.add_argument(
+        'node_fn_in', default='/dev/stdin', nargs='?', help='Input file')
+    parser.add_argument(
+        'fn_out', default='/dev/stdout', nargs='?', help='Output file')
+    args = parser.parse_args()
+    run(
+        open(args.speed_fn_in, 'r'),
+        open(args.node_fn_in, 'r'),
+        open(args.fn_out, 'w'),
+        verbose=args.verbose)
--- a/fuzzers/007-timing/speed/top.v
+++ b/fuzzers/007-timing/speed/top.v
@ -0,0 +1,8 @@
+module top(input clk, stb, di, output do);
+    reg dor;
+	always @(posedge clk) begin
+	    dor <= stb & di;
+	end
+    assign do = dor;
+endmodule
+
--- a/fuzzers/007-timing/src/picorv32.v
+++ b/fuzzers/007-timing/src/picorv32.v
--- a/fuzzers/007-timing/sub2csv.py
+++ b/fuzzers/007-timing/sub2csv.py
@ -0,0 +1,89 @@
+#!/usr/bin/env python3
+
+from timfuz import Benchmark
+import json
+
+
+def write_state(state, fout):
+    j = {
+        'names': dict([(x, None) for x in state.names]),
+        'drop_names': list(state.drop_names),
+        'base_names': list(state.base_names),
+        'subs': dict([(name, values) for name, values in state.subs.items()]),
+        'pivots': state.pivots,
+    }
+    json.dump(j, fout, sort_keys=True, indent=4, separators=(',', ': '))
+
+
+def gen_rows(fn_ins):
+    for fn_in in fn_ins:
+        try:
+            print('Loading %s' % fn_in)
+            j = json.load(open(fn_in, 'r'))
+
+            group0 = list(j['subs'].values())[0]
+            value0 = list(group0.values())[0]
+            if type(value0) is float:
+                print("WARNING: skipping old format JSON")
+                continue
+            else:
+                print("Value OK")
+
+            for sub in j['subs'].values():
+                row_ds = {}
+                # TODO: convert to gcd
+                # den may not always be 0
+                # lazy solution...just multiply out all the fractions
+                n = 1
+                for _var, (_num, den) in sub.items():
+                    n *= den
+
+                for var, (num, den) in sub.items():
+                    num2 = n * num
+                    assert num2 % den == 0
+                    row_ds[var] = num2 / den
+                yield row_ds
+        except:
+            print("Error processing %s" % fn_in)
+            raise
+
+
+def run(fnout, fn_ins, verbose=0):
+    print('Loading data')
+
+    with open(fnout, 'w') as fout:
+        fout.write('ico,fast_max fast_min slow_max slow_min,rows...\n')
+        for row_ds in gen_rows(fn_ins):
+            ico = '1'
+            out_b = [1e9, 1e9, 1e9, 1e9]
+            items = [ico, ' '.join(['%u' % x for x in out_b])]
+
+            for k, v in sorted(row_ds.items()):
+                items.append('%i %s' % (v, k))
+            fout.write(','.join(items) + '\n')
+
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description=
+        'Convert substitution groups into .csv to allow incremental rref results'
+    )
+
+    parser.add_argument('--verbose', action='store_true', help='')
+    parser.add_argument('--out', help='Output csv')
+    parser.add_argument('fns_in', nargs='+', help='sub.json input files')
+    args = parser.parse_args()
+    bench = Benchmark()
+
+    fns_in = args.fns_in
+
+    try:
+        run(fnout=args.out, fn_ins=args.fns_in, verbose=args.verbose)
+    finally:
+        print('Exiting after %s' % bench)
+
+
+if __name__ == '__main__':
+    main()
--- a/fuzzers/007-timing/tile_annotate.py
+++ b/fuzzers/007-timing/tile_annotate.py
@ -0,0 +1,61 @@
+#!/usr/bin/env python3
+
+import timfuz
+from timfuz import loadc_Ads_bs, Ads2bounds
+
+import sys
+import os
+import time
+import json
+
+
+def run(fns_in, fnout, tile_json_fn, verbose=False):
+    # modified in place
+    tilej = json.load(open(tile_json_fn, 'r'))
+
+    for fnin in fns_in:
+        Ads, bs = loadc_Ads_bs([fnin], ico=True)
+        bounds = Ads2bounds(Ads, bs)
+
+        for tile in tilej['tiles'].values():
+            pips = tile['pips']
+            for k, v in pips.items():
+                pips[k] = bounds.get('PIP_' + v, [None, None, None, None])
+
+            wires = tile['wires']
+            for k, v in wires.items():
+                wires[k] = bounds.get('WIRE_' + v, [None, None, None, None])
+
+    timfuz.tilej_stats(tilej)
+
+    json.dump(
+        tilej,
+        open(fnout, 'w'),
+        sort_keys=True,
+        indent=4,
+        separators=(',', ': '))
+
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description=
+        'Substitute timgrid timing model names for real timing values')
+    parser.add_argument(
+        '--timgrid-s',
+        default='../../timgrid/build/timgrid-s.json',
+        help='tilegrid timing delay symbolic input (timgrid-s.json)')
+    parser.add_argument(
+        '--out',
+        default='build/timgrid-vc.json',
+        help='tilegrid timing delay values at corner (timgrid-vc.json)')
+    parser.add_argument(
+        'fn_ins', nargs='+', help='Input flattened timing csv (flat.json)')
+    args = parser.parse_args()
+
+    run(args.fn_ins, args.out, args.timgrid_s, verbose=False)
+
+
+if __name__ == '__main__':
+    main()
--- a/fuzzers/007-timing/timfuz.py
+++ b/fuzzers/007-timing/timfuz.py
@ -0,0 +1,849 @@
+#!/usr/bin/env python
+
+import math
+import numpy as np
+from collections import OrderedDict
+import time
+import re
+import os
+import datetime
+import json
+import copy
+import sys
+import random
+import glob
+from fractions import Fraction
+
+from benchmark import Benchmark
+
+NAME_ZERO = set(
+    [
+        "BSW_CLK_ZERO",
+        "BSW_ZERO",
+        "B_ZERO",
+        "C_CLK_ZERO",
+        "C_DSP_ZERO",
+        "C_ZERO",
+        "I_ZERO",
+        "O_ZERO",
+        "RC_ZERO",
+        "R_ZERO",
+    ])
+
+# csv index
+corner_s2i = OrderedDict(
+    [
+        ('fast_max', 0),
+        ('fast_min', 1),
+        ('slow_max', 2),
+        ('slow_min', 3),
+    ])
+
+
+# Equations are filtered out until nothing is left
+class SimplifiedToZero(Exception):
+    pass
+
+
+def allow_zero_eqns():
+    return os.getenv('ALLOW_ZERO_EQN', 'N') == 'Y'
+
+
+def print_eqns(A_ubd, b_ub, verbose=0, lim=3, label=''):
+    rows = len(b_ub)
+
+    print('Sample equations (%s) from %d r' % (label, rows))
+    prints = 0
+    #verbose = 1
+    for rowi, row in enumerate(A_ubd):
+        if verbose or ((rowi < 10 or rowi % max(1, (rows / 20)) == 0) and
+                       (not lim or prints < lim)):
+            line = '  EQN: p%u: ' % rowi
+            for k, v in sorted(row.items()):
+                line += '%u*t%d ' % (v, k)
+            line += '= %d' % b_ub[rowi]
+            print(line)
+            prints += 1
+
+
+def print_name_eqns(A_ubd, b_ub, names, verbose=0, lim=3, label=''):
+    rows = len(b_ub)
+
+    print('Sample equations (%s) from %d r' % (label, rows))
+    prints = 0
+    #verbose = 1
+    for rowi, row in enumerate(A_ubd):
+        if verbose or ((rowi < 10 or rowi % max(1, (rows / 20)) == 0) and
+                       (not lim or prints < lim)):
+            line = '  EQN: p%u: ' % rowi
+            for k, v in sorted(row.items()):
+                line += '%u*%s ' % (v, names[k])
+            line += '= %d' % b_ub[rowi]
+            print(line)
+            prints += 1
+
+
+def print_names(names, verbose=1):
+    print('Names: %d' % len(names))
+    for xi, name in enumerate(names):
+        print('  % 4u % -80s' % (xi, name))
+
+
+def invb(b_ub):
+    #return [-b for b in b_ub]
+    return -np.array(b_ub)
+
+
+def check_feasible_d(A_ubd, b_ub, names):
+    A_ub, b_ub_inv = Ab_d2np(A_ubd, b_ub, names)
+    check_feasible(A_ub, b_ub_inv)
+
+
+def check_feasible(A_ub, b_ub):
+    sys.stdout.write('Check feasible ')
+    sys.stdout.flush()
+
+    rows = len(b_ub)
+    cols = len(A_ub[0])
+
+    progress = max(1, rows / 100)
+
+    # Chose a high arbitrary value for x
+    # Delays should be in order of ns, so a 10 ns delay should be way above what anything should be
+    xs = [10e3 for _i in range(cols)]
+
+    # FIXME: use the correct np function to do this for me
+    # Verify bounds
+    #b_res = np.matmul(A_ub, xs)
+    #print(type(A_ub), type(xs)
+    #A_ub = np.array(A_ub)
+    #xs = np.array(xs)
+    #b_res = np.matmul(A_ub, xs)
+    def my_mul(A_ub, xs):
+        #print('cols', cols
+        #print('rows', rows
+        ret = [None] * rows
+        for row in range(rows):
+            this = 0
+            for col in range(cols):
+                this += A_ub[row][col] * xs[col]
+            ret[row] = this
+        return ret
+
+    b_res = my_mul(A_ub, xs)
+
+    # Verify bound was respected
+    for rowi, (this_b, this_b_ub) in enumerate(zip(b_res, b_ub)):
+        if rowi % progress == 0:
+            sys.stdout.write('.')
+            sys.stdout.flush()
+        if this_b >= this_b_ub or this_b > 0:
+            print(
+                '% 4d Want res % 10.1f <= % 10.1f <= 0' %
+                (rowi, this_b, this_b_ub))
+            raise Exception("Bad ")
+    print(' done')
+
+
+def Ab_ub_dt2d(eqns):
+    '''Convert dict using the rows as keys into a list of dicts + b_ub list (ie return A_ub, b_ub)'''
+    #return [dict(rowt) for rowt in eqns]
+    rows = [(dict(rowt), b) for rowt, b in eqns.items()]
+    A_ubd, b_ub = zip(*rows)
+    return list(A_ubd), list(b_ub)
+
+
+# This significantly reduces runtime
+def simplify_rows(Ads, b_ub, remove_zd=False, corner=None):
+    '''Remove duplicate equations, taking highest delay'''
+    # dict of constants to highest delay
+    eqns = OrderedDict()
+    assert len(Ads) == len(b_ub), (len(Ads), len(b_ub))
+
+    assert corner is not None
+    minmax = {
+        'fast_max': max,
+        'fast_min': min,
+        'slow_max': max,
+        'slow_min': min,
+    }[corner]
+    # An outlier to make unknown values be ignored
+    T_UNK = {
+        'fast_max': 0,
+        'fast_min': 10e9,
+        'slow_max': 0,
+        'slow_min': 10e9,
+    }[corner]
+
+    sys.stdout.write('SimpR ')
+    sys.stdout.flush()
+    progress = int(max(1, len(b_ub) / 100))
+    zero_ds = 0
+    zero_es = 0
+    for loopi, (b, rowd) in enumerate(zip(b_ub, Ads)):
+        if loopi % progress == 0:
+            sys.stdout.write('.')
+            sys.stdout.flush()
+
+        # TODO: elements have zero delay (ex: COUT)
+        # Remove these from now since they make me nervous
+        # Although they should just solve to 0
+        if remove_zd and not b:
+            zero_ds += 1
+            continue
+
+        # think ran into these before when taking out ZERO elements
+        if len(rowd) == 0:
+            if b != 0:
+                assert zero_es == 0, 'Unexpected zero element row with non-zero delay'
+            zero_es += 1
+            continue
+
+        rowt = Ar_ds2t(rowd)
+        eqns[rowt] = minmax(eqns.get(rowt, T_UNK), b)
+
+    print(' done')
+
+    print(
+        'Simplify rows: %d => %d rows w/ zd %d, ze %d' %
+        (len(b_ub), len(eqns), zero_ds, zero_es))
+    if len(eqns) == 0:
+        raise SimplifiedToZero()
+    A_ubd_ret, b_ub_ret = Ab_ub_dt2d(eqns)
+    return A_ubd_ret, b_ub_ret
+
+
+def A_ubr_np2d(row):
+    '''Convert a single row'''
+    #d = {}
+    d = OrderedDict()
+    for coli, val in enumerate(row):
+        if val:
+            d[coli] = val
+    return d
+
+
+def A_ub_np2d(A_ub):
+    '''Convert A_ub entries in numpy matrix to dictionary / sparse form'''
+    Adi = [None] * len(A_ub)
+    for i, row in enumerate(A_ub):
+        Adi[i] = A_ubr_np2d(row)
+    return Adi
+
+
+def Ar_di2np(row_di, cols):
+    rownp = np.zeros(cols)
+    for coli, val in row_di.items():
+        # Sign inversion due to way solver works
+        rownp[coli] = val
+    return rownp
+
+
+# NOTE: sign inversion
+def A_di2np(Adi, cols):
+    '''Convert A_ub entries in dictionary / sparse to numpy matrix form'''
+    return [Ar_di2np(row_di, cols) for row_di in Adi]
+
+
+def Ar_ds2t(rowd):
+    '''Convert a dictionary row into a tuple with (column number, value) tuples'''
+    return tuple(sorted(rowd.items()))
+
+
+def A_ubr_t2d(rowt):
+    '''Convert a dictionary row into a tuple with (column number, value) tuples'''
+    return OrderedDict(rowt)
+
+
+def A_ub_d2t(A_ubd):
+    '''Convert rows as dicts to rows as tuples'''
+    return [Ar_ds2t(rowd) for rowd in A_ubd]
+
+
+def A_ub_t2d(A_ubd):
+    '''Convert rows as tuples to rows as dicts'''
+    return [OrderedDict(rowt) for rowt in A_ubd]
+
+
+def Ab_d2np(A_ubd, b_ub, names):
+    A_ub = A_di2np(A_ubd, len(names))
+    b_ub_inv = invb(b_ub)
+    return A_ub, b_ub_inv
+
+
+def Ab_np2d(A_ub, b_ub_inv):
+    A_ubd = A_ub_np2d(A_ub)
+    b_ub = invb(b_ub_inv)
+    return A_ubd, b_ub
+
+
+def sort_equations(Ads, b):
+    # Track rows with value column
+    # Hmm can't sort against np arrays
+    tosort = [(sorted(row.items()), rowb) for row, rowb in zip(Ads, b)]
+    #res = sorted(tosort, key=lambda e: e[0])
+    res = sorted(tosort)
+    A_ubtr, b_ubr = zip(*res)
+    return [OrderedDict(rowt) for rowt in A_ubtr], b_ubr
+
+
+def derive_eq_by_row(A_ubd, b_ub, verbose=0, col_lim=0, tweak=False):
+    '''
+    Derive equations by subtracting whole rows
+
+    Given equations like:
+    t0           >= 10
+    t0 + t1      >= 15
+    t0 + t1 + t2 >= 17
+
+    When I look at these, I think of a solution something like:
+    t0 = 10f
+    t1 = 5
+    t2 = 2
+
+    However, linprog tends to choose solutions like:
+    t0 = 17
+    t1 = 0
+    t2 = 0
+
+    To this end, add additional constraints by finding equations that are subsets of other equations
+    How to do this in a reasonable time span?
+    Also equations are sparse, which makes this harder to compute
+    '''
+    rows = len(A_ubd)
+    assert rows == len(b_ub)
+
+    # Index equations into hash maps so can lookup sparse elements quicker
+    assert len(A_ubd) == len(b_ub)
+    A_ubd_ret = copy.copy(A_ubd)
+    assert len(A_ubd) == len(A_ubd_ret)
+
+    #print('Finding subsets')
+    ltes = 0
+    scs = 0
+    b_ub_ret = list(b_ub)
+    sys.stdout.write('Deriving rows ')
+    sys.stdout.flush()
+    progress = max(1, rows / 100)
+    for row_refi, row_ref in enumerate(A_ubd):
+        if row_refi % progress == 0:
+            sys.stdout.write('.')
+            sys.stdout.flush()
+        if col_lim and len(row_ref) > col_lim:
+            continue
+
+        for row_cmpi, row_cmp in enumerate(A_ubd):
+            if row_refi == row_cmpi or col_lim and len(row_cmp) > col_lim:
+                continue
+            # FIXME: this check was supposed to be removed
+            '''
+            Every elements in row_cmp is in row_ref
+            but this doesn't mean the constants are smaller
+            Filter these out
+            '''
+            # XXX: just reduce and filter out solutions with positive constants
+            # or actually are these also useful as is?
+            lte = lte_const(row_ref, row_cmp)
+            if lte:
+                ltes += 1
+            sc = 0 and shared_const(row_ref, row_cmp)
+            if sc:
+                scs += 1
+            if lte or sc:
+                if verbose:
+                    print('')
+                    print('match')
+                    print('  ', row_ref, b_ub[row_refi])
+                    print('  ', row_cmp, b_ub[row_cmpi])
+                # Reduce
+                A_new = reduce_const(row_ref, row_cmp)
+                # Did this actually significantly reduce the search space?
+                #if tweak and len(A_new) > 4 and len(A_new) > len(row_cmp) / 2:
+                if tweak and len(A_new) > 8 and len(A_new) > len(row_cmp) / 2:
+                    continue
+                b_new = b_ub[row_refi] - b_ub[row_cmpi]
+                # Definitely possible
+                # Maybe filter these out if they occur?
+                if verbose:
+                    print(b_new)
+                # Also inverted sign
+                if b_new <= 0:
+                    if verbose:
+                        print("Unexpected b")
+                    continue
+                if verbose:
+                    print('OK')
+                A_ubd_ret.append(A_new)
+                b_ub_ret.append(b_new)
+    print(' done')
+
+    #A_ub_ret = A_di2np(A_ubd2, cols=cols)
+    print(
+        'Derive row: %d => %d rows using %d lte, %d sc' %
+        (len(b_ub), len(b_ub_ret), ltes, scs))
+    assert len(A_ubd_ret) == len(b_ub_ret)
+    return A_ubd_ret, b_ub_ret
+
+
+def derive_eq_by_col(A_ubd, b_ub, verbose=0):
+    '''
+    Derive equations by subtracting out all bounded constants (ie "known" columns)
+    '''
+    rows = len(A_ubd)
+
+    # Find all entries where
+
+    # Index equations with a single constraint
+    knowns = {}
+    sys.stdout.write('Derive col indexing ')
+    #A_ubd = A_ub_np2d(A_ub)
+    sys.stdout.flush()
+    progress = max(1, rows / 100)
+    for row_refi, row_refd in enumerate(A_ubd):
+        if row_refi % progress == 0:
+            sys.stdout.write('.')
+            sys.stdout.flush()
+        if len(row_refd) == 1:
+            k, v = list(row_refd.items())[0]
+            # Reduce any constants to canonical form
+            if v != 1:
+                row_refd[k] = 1
+                b_ub[row_refi] /= v
+            knowns[k] = b_ub[row_refi]
+    print(' done')
+    #knowns_set = set(knowns.keys())
+    print('%d constrained' % len(knowns))
+    '''
+    Now see what we can do
+    Rows that are already constrained: eliminate
+        TODO: maybe keep these if this would violate their constraint
+    Otherwise eliminate the original row and generate a simplified result now
+    '''
+    b_ub_ret = []
+    A_ubd_ret = []
+    sys.stdout.write('Derive col main ')
+    sys.stdout.flush()
+    progress = max(1, rows / 100)
+    for row_refi, row_refd in enumerate(A_ubd):
+        if row_refi % progress == 0:
+            sys.stdout.write('.')
+            sys.stdout.flush()
+        # Reduce as much as possible
+        #row_new = {}
+        row_new = OrderedDict()
+        b_new = b_ub[row_refi]
+        # Copy over single entries
+        if len(row_refd) == 1:
+            row_new = row_refd
+        else:
+            for k, v in row_refd.items():
+                if k in knowns:
+                    # Remove column and take out corresponding delay
+                    b_new -= v * knowns[k]
+                # Copy over
+                else:
+                    row_new[k] = v
+
+        # Possibly reduced all usable contants out
+        if len(row_new) == 0:
+            continue
+        if b_new <= 0:
+            continue
+
+        A_ubd_ret.append(row_new)
+        b_ub_ret.append(b_new)
+    print(' done')
+
+    print('Derive col: %d => %d rows' % (len(b_ub), len(b_ub_ret)))
+    return A_ubd_ret, b_ub_ret
+
+
+def col_dist(Ads, desc='of', names=[], lim=0):
+    '''print(frequency distribution of number of elements in a given row'''
+    rows = len(Ads)
+    cols = len(names)
+
+    fs = {}
+    for row in Ads:
+        this_cols = len(row)
+        fs[this_cols] = fs.get(this_cols, 0) + 1
+
+    print(
+        'Col count distribution (%s) for %dr x %dc w/ %d freqs' %
+        (desc, rows, cols, len(fs)))
+    prints = 0
+    for i, (k, v) in enumerate(sorted(fs.items())):
+        if lim == 0 or (lim and prints < lim or i == len(fs) - 1):
+            print('  %d: %d' % (k, v))
+        prints += 1
+        if lim and prints == lim:
+            print('  ...')
+
+
+def name_dist(A_ubd, desc='of', names=[], lim=0):
+    '''print(frequency distribution of number of times an element appears'''
+    rows = len(A_ubd)
+    cols = len(names)
+
+    fs = {i: 0 for i in range(len(names))}
+    for row in A_ubd:
+        for k in row.keys():
+            fs[k] += 1
+
+    print('Name count distribution (%s) for %dr x %dc' % (desc, rows, cols))
+    prints = 0
+    for namei, name in enumerate(names):
+        if lim == 0 or (lim and prints < lim or namei == len(fs) - 1):
+            print('  %s: %d' % (name, fs[namei]))
+            prints += 1
+            if lim and prints == lim:
+                print('  ...')
+
+    fs2 = {}
+    for v in fs.values():
+        fs2[v] = fs2.get(v, 0) + 1
+    prints = 0
+    print('Distribution distribution (%d items)' % len(fs2))
+    for i, (k, v) in enumerate(sorted(fs2.items())):
+        if lim == 0 or (lim and prints < lim or i == len(fs2) - 1):
+            print('  %s: %s' % (k, v))
+            prints += 1
+            if lim and prints == lim:
+                print('  ...')
+
+    zeros = fs2.get(0, 0)
+    if zeros:
+        raise Exception("%d names without equation" % zeros)
+
+
+def filter_ncols(A_ubd, b_ub, cols_min=0, cols_max=0):
+    '''Only keep equations with a few delay elements'''
+    A_ubd_ret = []
+    b_ub_ret = []
+
+    #print('Removing large rows')
+    for rowd, b in zip(A_ubd, b_ub):
+        if (not cols_min or len(rowd) >= cols_min) and (not cols_max or
+                                                        len(rowd) <= cols_max):
+            A_ubd_ret.append(rowd)
+            b_ub_ret.append(b)
+
+    print(
+        'Filter ncols w/ %d <= cols <= %d: %d ==> %d rows' %
+        (cols_min, cols_max, len(b_ub), len(b_ub_ret)))
+    assert len(b_ub_ret)
+    return A_ubd_ret, b_ub_ret
+
+
+def Ar_di2ds(rowA, names):
+    row = OrderedDict()
+    for k, v in rowA.items():
+        row[names[k]] = v
+    return row
+
+
+def A_di2ds(Adi, names):
+    rows = []
+    for row_di in Adi:
+        rows.append(Ar_di2ds(row_di, names))
+    return rows
+
+
+def Ar_ds2di(row_ds, names):
+    def keyi(name):
+        if name not in names:
+            names[name] = len(names)
+        return names[name]
+
+    row_di = OrderedDict()
+    for k, v in row_ds.items():
+        row_di[keyi(k)] = v
+    return row_di
+
+
+def A_ds2di(rows):
+    names = OrderedDict()
+
+    A_ubd = []
+    for row_ds in rows:
+        A_ubd.append(Ar_ds2di(row_ds, names))
+
+    return list(names.keys()), A_ubd
+
+
+def A_ds2np(Ads):
+    names, Adi = A_ds2di(Ads)
+    return names, A_di2np(Adi, len(names))
+
+
+def loadc_Ads_mkb(fns, mkb, filt):
+    bs = []
+    Ads = []
+    for fn in fns:
+        with open(fn, 'r') as f:
+            # skip header
+            f.readline()
+            for l in f:
+                cols = l.split(',')
+                ico = bool(int(cols[0]))
+                corners = cols[1]
+                vars = cols[2:]
+
+                def mkcorner(bstr):
+                    if bstr == 'None':
+                        return None
+                    else:
+                        return int(bstr)
+
+                corners = [mkcorner(corner) for corner in corners.split()]
+
+                def mkvar(x):
+                    i, var = x.split()
+                    return (var, int(i))
+
+                vars = OrderedDict([mkvar(var) for var in vars])
+                if not filt(ico, corners, vars):
+                    continue
+
+                bs.append(mkb(corners))
+                Ads.append(vars)
+
+    return Ads, bs
+
+
+def loadc_Ads_b(fns, corner, ico=None):
+    corner = corner or "slow_max"
+    corneri = corner_s2i[corner]
+
+    if ico is not None:
+        filt = lambda ico_, corners, vars: ico_ == ico
+    else:
+        filt = lambda ico_, corners, vars: True
+
+    def mkb(val):
+        return val[corneri]
+
+    return loadc_Ads_mkb(fns, mkb, filt)
+
+
+def loadc_Ads_bs(fns, ico=None):
+    if ico is not None:
+        filt = lambda ico_, corners, vars: ico_ == ico
+    else:
+        filt = lambda ico_, corners, vars: True
+
+    def mkb(val):
+        return val
+
+    return loadc_Ads_mkb(fns, mkb, filt)
+
+
+def loadc_Ads_raw(fns):
+    filt = lambda ico, corners, vars: True
+
+    def mkb(val):
+        return val
+
+    return loadc_Ads_mkb(fns, mkb, filt)
+
+
+def index_names(Ads):
+    names = set()
+    for row_ds in Ads:
+        for k1 in row_ds.keys():
+            names.add(k1)
+    return names
+
+
+def load_sub(fn):
+    j = json.load(open(fn, 'r'))
+
+    for name, vals in sorted(j['subs'].items()):
+        for k, v in vals.items():
+            vals[k] = Fraction(v[0], v[1])
+
+    return j
+
+
+def row_sub_vars(row, sub_json, strict=False, verbose=False):
+    if 0 and verbose:
+        print("")
+        print(row.items())
+
+    delvars = 0
+    for k in sub_json['drop_names']:
+        try:
+            del row[k]
+            delvars += 1
+        except KeyError:
+            pass
+    if verbose:
+        print("Deleted %u variables" % delvars)
+
+    if verbose:
+        print('Checking pivots')
+        print(sorted(row.items()))
+        for group, pivot in sorted(sub_json['pivots'].items()):
+            if pivot not in row:
+                continue
+            n = row[pivot]
+            print('  pivot %u %s' % (n, pivot))
+
+    #pivots = set(sub_json['pivots'].values()).intersection(row.keys())
+    for group, pivot in sorted(sub_json['pivots'].items()):
+        if pivot not in row:
+            continue
+        # take the sub out n times
+        # note constants may be negative
+        n = row[pivot]
+        if verbose:
+            print('pivot %i %s' % (n, pivot))
+        for subk, subv in sorted(sub_json['subs'][group].items()):
+            oldn = row.get(subk, type(subv)(0))
+            rown = -n * subv
+            rown += oldn
+            if verbose:
+                print("  %s: %d => %d" % (subk, oldn, rown))
+            if rown == 0:
+                # only becomes zero if didn't previously exist
+                del row[subk]
+                if verbose:
+                    print("    del")
+            else:
+                row[subk] = rown
+        row[group] = n
+        assert pivot not in row
+
+    # after all constants are applied, the row should end up positive?
+    # numeric precision issues previously limited this
+    # Ex: AssertionError: ('PIP_BSW_2ELSING0', -2.220446049250313e-16)
+    if strict:
+        # verify no subs are left
+        for subs in sub_json['subs'].values():
+            for sub in subs:
+                assert sub not in row, 'non-pivot element after group sub %s' % sub
+
+        # Verify all constants are positive
+        for k, v in sorted(row.items()):
+            assert v > 0, (k, v)
+
+
+def run_sub_json(Ads, sub_json, strict=False, verbose=False):
+    '''
+    strict: complain if a sub doesn't go in evenly
+    '''
+
+    nrows = 0
+    nsubs = 0
+
+    ncols_old = 0
+    ncols_new = 0
+
+    print('Subbing %u rows' % len(Ads))
+    prints = set()
+
+    for rowi, row in enumerate(Ads):
+        if 0 and verbose:
+            print(row)
+        if verbose:
+            print('')
+            print('Row %u w/ %u elements' % (rowi, len(row)))
+
+        row_orig = dict(row)
+        row_sub_vars(row, sub_json, strict=strict, verbose=verbose)
+        nrows += 1
+        if row_orig != row:
+            nsubs += 1
+            if verbose:
+                rowt = Ar_ds2t(row)
+                if rowt not in prints:
+                    print('row', row)
+                    prints.add(rowt)
+        ncols_old += len(row_orig)
+        ncols_new += len(row)
+
+    if verbose:
+        print('')
+
+    print("Sub: %u / %u rows changed" % (nsubs, nrows))
+    print("Sub: %u => %u non-zero row cols" % (ncols_old, ncols_new))
+
+
+def print_eqns(Ads, b, verbose=0, lim=3, label=''):
+    rows = len(b)
+
+    print('Sample equations (%s) from %d r' % (label, rows))
+    prints = 0
+    for rowi, row in enumerate(Ads):
+        if verbose or ((rowi < 10 or rowi % max(1, (rows / 20)) == 0) and
+                       (not lim or prints < lim)):
+            line = '  EQN: p%u: ' % rowi
+            for k, v in sorted(row.items()):
+                line += '%u*t%s ' % (v, k)
+            line += '= %d' % b[rowi]
+            print(line)
+            prints += 1
+
+
+def print_eqns_np(A_ub, b_ub, verbose=0):
+    Adi = A_ub_np2d(A_ub)
+    print_eqns(Adi, b_ub, verbose=verbose)
+
+
+def Ads2bounds(Ads, bs):
+    ret = {}
+    for row_ds, row_bs in zip(Ads, bs):
+        assert len(row_ds) == 1
+        k, v = list(row_ds.items())[0]
+        assert v == 1
+        ret[k] = row_bs
+    return ret
+
+
+def instances(Ads):
+    ret = 0
+    for row_ds in Ads:
+        ret += sum(row_ds.values())
+    return ret
+
+
+def acorner2csv(b, corneri):
+    corners = ["None" for _ in range(4)]
+    corners[corneri] = str(b)
+    return ' '.join(corners)
+
+
+def corners2csv(bs):
+    assert len(bs) == 4
+    corners = ["None" if b is None else str(b) for b in bs]
+    return ' '.join(corners)
+
+
+def tilej_stats(tilej):
+    stats = {}
+    for etype in ('pips', 'wires'):
+        tm = stats.setdefault(etype, {})
+        tm['net'] = 0
+        tm['solved'] = [0, 0, 0, 0]
+        tm['covered'] = [0, 0, 0, 0]
+
+    for tile in tilej['tiles'].values():
+        for etype in ('pips', 'wires'):
+            pips = tile[etype]
+            for k, v in pips.items():
+                stats[etype]['net'] += 1
+                for i in range(4):
+                    if pips[k][i]:
+                        stats[etype]['solved'][i] += 1
+                    if pips[k][i] is not None:
+                        stats[etype]['covered'][i] += 1
+
+    for corner, corneri in corner_s2i.items():
+        print('Corner %s' % corner)
+        for etype in ('pips', 'wires'):
+            net = stats[etype]['net']
+            solved = stats[etype]['solved'][corneri]
+            covered = stats[etype]['covered'][corneri]
+            print(
+                '  %s: %u / %u solved, %u / %u covered' %
+                (etype, solved, net, covered, net))
--- a/fuzzers/007-timing/timfuz_massage.py
+++ b/fuzzers/007-timing/timfuz_massage.py
@ -0,0 +1,422 @@
+#!/usr/bin/env python3
+
+from timfuz import simplify_rows, print_eqns, print_eqns_np, sort_equations, col_dist, index_names
+import numpy as np
+import math
+import sys
+import datetime
+import os
+import time
+import copy
+from collections import OrderedDict
+
+
+def lte_const(row_ref, row_cmp):
+    '''Return true if all constants are smaller magnitude in row_cmp than row_ref'''
+    #return False
+    for k, vc in row_cmp.items():
+        vr = row_ref.get(k, None)
+        # Not in reference?
+        if vr is None:
+            return False
+        if vr < vc:
+            return False
+    return True
+
+
+def shared_const(row_ref, row_cmp):
+    '''Return true if more constants are equal than not equal'''
+    #return False
+    matches = 0
+    unmatches = 0
+    ks = list(row_ref.keys()) + list(row_cmp.keys())
+    for k in ks:
+        vr = row_ref.get(k, None)
+        vc = row_cmp.get(k, None)
+        # At least one
+        if vr is not None and vc is not None:
+            if vc == vr:
+                matches += 1
+            else:
+                unmatches += 1
+        else:
+            unmatches += 1
+
+    # Will equation reduce if subtracted?
+    return matches > unmatches
+
+
+def reduce_const(row_ref, row_cmp):
+    '''Subtract cmp constants from ref'''
+    #ret = {}
+    ret = OrderedDict()
+    ks = set(row_ref.keys())
+    ks.update(set(row_cmp.keys()))
+    for k in ks:
+        vr = row_ref.get(k, 0)
+        vc = row_cmp.get(k, 0)
+        res = vr - vc
+        if res:
+            ret[k] = res
+    return ret
+
+
+def derive_eq_by_row(Ads, b, verbose=0, col_lim=0, tweak=False):
+    '''
+    Derive equations by subtracting whole rows
+
+    Given equations like:
+    t0           >= 10
+    t0 + t1      >= 15
+    t0 + t1 + t2 >= 17
+
+    When I look at these, I think of a solution something like:
+    t0 = 10f
+    t1 = 5
+    t2 = 2
+
+    However, linprog tends to choose solutions like:
+    t0 = 17
+    t1 = 0
+    t2 = 0
+
+    To this end, add additional constraints by finding equations that are subsets of other equations
+    How to do this in a reasonable time span?
+    Also equations are sparse, which makes this harder to compute
+    '''
+    assert len(Ads) == len(b), 'Ads, b length mismatch'
+    rows = len(Ads)
+
+    # Index equations into hash maps so can lookup sparse elements quicker
+    assert len(Ads) == len(b)
+    Ads_ret = copy.copy(Ads)
+    assert len(Ads) == len(Ads_ret)
+
+    #print('Finding subsets')
+    ltes = 0
+    scs = 0
+    b_ret = list(b)
+    sys.stdout.write('Deriving rows (%u) ' % rows)
+    sys.stdout.flush()
+    progress = int(max(1, rows / 100))
+    for row_refi, row_ref in enumerate(Ads):
+        if row_refi % progress == 0:
+            sys.stdout.write('.')
+            sys.stdout.flush()
+        if col_lim and len(row_ref) > col_lim:
+            continue
+
+        for row_cmpi, row_cmp in enumerate(Ads):
+            if row_refi == row_cmpi or col_lim and len(row_cmp) > col_lim:
+                continue
+            # FIXME: this check was supposed to be removed
+            '''
+            Every elements in row_cmp is in row_ref
+            but this doesn't mean the constants are smaller
+            Filter these out
+            '''
+            # XXX: just reduce and filter out solutions with positive constants
+            # or actually are these also useful as is?
+            lte = lte_const(row_ref, row_cmp)
+            if lte:
+                ltes += 1
+            sc = 0 and shared_const(row_ref, row_cmp)
+            if sc:
+                scs += 1
+            if lte or sc:
+                if verbose:
+                    print('')
+                    print('match')
+                    print('  ', row_ref, b[row_refi])
+                    print('  ', row_cmp, b[row_cmpi])
+                # Reduce
+                A_new = reduce_const(row_ref, row_cmp)
+                # Did this actually significantly reduce the search space?
+                #if tweak and len(A_new) > 4 and len(A_new) > len(row_cmp) / 2:
+                if tweak and len(A_new) > 8 and len(A_new) > len(row_cmp) / 2:
+                    continue
+                b_new = b[row_refi] - b[row_cmpi]
+                # Definitely possible
+                # Maybe filter these out if they occur?
+                if verbose:
+                    print(b_new)
+                # Also inverted sign
+                if b_new <= 0:
+                    if verbose:
+                        print("Unexpected b")
+                    continue
+                if verbose:
+                    print('OK')
+                Ads_ret.append(A_new)
+                b_ret.append(b_new)
+    print(' done')
+
+    #A_ub_ret = A_di2np(Ads2, cols=cols)
+    print(
+        'Derive row: %d => %d rows using %d lte, %d sc' %
+        (len(b), len(b_ret), ltes, scs))
+    assert len(Ads_ret) == len(b_ret)
+    return Ads_ret, b_ret
+
+
+def derive_eq_by_near_row(Ads, b, verbose=0, col_lim=0, tweak=False):
+    '''
+    Derive equations by subtracting whole rows
+
+    Given equations like:
+    t0           >= 10
+    t0 + t1      >= 15
+    t0 + t1 + t2 >= 17
+
+    When I look at these, I think of a solution something like:
+    t0 = 10f
+    t1 = 5
+    t2 = 2
+
+    However, linprog tends to choose solutions like:
+    t0 = 17
+    t1 = 0
+    t2 = 0
+
+    To this end, add additional constraints by finding equations that are subsets of other equations
+    How to do this in a reasonable time span?
+    Also equations are sparse, which makes this harder to compute
+    '''
+    rows = len(Ads)
+    assert rows == len(b)
+    rowdelta = int(rows / 2)
+
+    # Index equations into hash maps so can lookup sparse elements quicker
+    assert len(Ads) == len(b)
+    Ads_ret = copy.copy(Ads)
+    assert len(Ads) == len(Ads_ret)
+
+    #print('Finding subsets')
+    ltes = 0
+    scs = 0
+    b_ret = list(b)
+    sys.stdout.write('Deriving rows (%u) ' % rows)
+    sys.stdout.flush()
+    progress = int(max(1, rows / 100))
+    for row_refi, row_ref in enumerate(Ads):
+        if row_refi % progress == 0:
+            sys.stdout.write('.')
+            sys.stdout.flush()
+        if col_lim and len(row_ref) > col_lim:
+            continue
+
+        #for row_cmpi, row_cmp in enumerate(Ads):
+        for row_cmpi in range(max(0, row_refi - rowdelta),
+                              min(len(Ads), row_refi + rowdelta)):
+            if row_refi == row_cmpi or col_lim and len(row_cmp) > col_lim:
+                continue
+            row_cmp = Ads[row_cmpi]
+            # FIXME: this check was supposed to be removed
+            '''
+            Every elements in row_cmp is in row_ref
+            but this doesn't mean the constants are smaller
+            Filter these out
+            '''
+            # XXX: just reduce and filter out solutions with positive constants
+            # or actually are these also useful as is?
+            lte = lte_const(row_ref, row_cmp)
+            if lte:
+                ltes += 1
+            sc = 0 and shared_const(row_ref, row_cmp)
+            if sc:
+                scs += 1
+            if lte or sc:
+                if verbose:
+                    print('')
+                    print('match')
+                    print('  ', row_ref, b[row_refi])
+                    print('  ', row_cmp, b[row_cmpi])
+                # Reduce
+                A_new = reduce_const(row_ref, row_cmp)
+                # Did this actually significantly reduce the search space?
+                #if tweak and len(A_new) > 4 and len(A_new) > len(row_cmp) / 2:
+                #if tweak and len(A_new) > 8 and len(A_new) > len(row_cmp) / 2:
+                #    continue
+                b_new = b[row_refi] - b[row_cmpi]
+                # Definitely possible
+                # Maybe filter these out if they occur?
+                if verbose:
+                    print(b_new)
+                # Also inverted sign
+                if b_new <= 0:
+                    if verbose:
+                        print("Unexpected b")
+                    continue
+                if verbose:
+                    print('OK')
+                Ads_ret.append(A_new)
+                b_ret.append(b_new)
+    print(' done')
+
+    #A_ub_ret = A_di2np(Ads2, cols=cols)
+    print(
+        'Derive row: %d => %d rows using %d lte, %d sc' %
+        (len(b), len(b_ret), ltes, scs))
+    assert len(Ads_ret) == len(b_ret)
+    return Ads_ret, b_ret
+
+
+def derive_eq_by_col(Ads, b_ub, verbose=0):
+    '''
+    Derive equations by subtracting out all bounded constants (ie "known" columns)
+    '''
+    rows = len(Ads)
+
+    # Find all entries where
+
+    # Index equations with a single constraint
+    knowns = {}
+    sys.stdout.write('Derive col indexing ')
+    sys.stdout.flush()
+    progress = max(1, rows / 100)
+    for row_refi, row_refd in enumerate(Ads):
+        if row_refi % progress == 0:
+            sys.stdout.write('.')
+            sys.stdout.flush()
+        if len(row_refd) == 1:
+            k, v = list(row_refd.items())[0]
+            # Reduce any constants to canonical form
+            if v != 1:
+                row_refd[k] = 1
+                b_ub[row_refi] /= v
+            knowns[k] = b_ub[row_refi]
+    print(' done')
+    #knowns_set = set(knowns.keys())
+    print('%d constrained' % len(knowns))
+    '''
+    Now see what we can do
+    Rows that are already constrained: eliminate
+        TODO: maybe keep these if this would violate their constraint
+    Otherwise eliminate the original row and generate a simplified result now
+    '''
+    b_ret = []
+    Ads_ret = []
+    sys.stdout.write('Derive col main ')
+    sys.stdout.flush()
+    progress = max(1, rows / 100)
+    for row_refi, row_refd in enumerate(Ads):
+        if row_refi % progress == 0:
+            sys.stdout.write('.')
+            sys.stdout.flush()
+        # Reduce as much as possible
+        #row_new = {}
+        row_new = OrderedDict()
+        b_new = b_ub[row_refi]
+        # Copy over single entries
+        if len(row_refd) == 1:
+            row_new = row_refd
+        else:
+            for k, v in row_refd.items():
+                if k in knowns:
+                    # Remove column and take out corresponding delay
+                    b_new -= v * knowns[k]
+                # Copy over
+                else:
+                    row_new[k] = v
+
+        # Possibly reduced all usable contants out
+        if len(row_new) == 0:
+            continue
+        if b_new <= 0:
+            continue
+
+        Ads_ret.append(row_new)
+        b_ret.append(b_new)
+    print(' done')
+
+    print('Derive col: %d => %d rows' % (len(b_ub), len(b_ret)))
+    return Ads_ret, b_ret
+
+
+# iteratively increasing column limit until all columns are added
+def massage_equations(Ads, b, verbose=False, corner=None):
+    '''
+    Subtract equations from each other to generate additional constraints
+    Helps provide additional guidance to solver for realistic delays
+
+    Ex: given:
+    a >= 10
+    a + b >= 100
+    A valid solution is:
+         a = 100
+    However, a better solution is something like
+        a = 10
+        b = 90
+    This creates derived constraints to provide more realistic results
+
+    Equation pipeline
+    Some operations may generate new equations
+    Simplify after these to avoid unnecessary overhead on redundant constraints
+    Similarly some operations may eliminate equations, potentially eliminating a column (ie variable)
+    Remove these columns as necessary to speed up solving
+    '''
+
+    assert len(Ads) == len(b), 'Ads, b length mismatch'
+
+    def debug(what):
+        if verbose:
+            print('')
+            print_eqns(Ads, b, verbose=verbose, label=what, lim=20)
+            col_dist(Ads, what)
+            check_feasible_d(Ads, b)
+
+    # Try to (intelligently) subtract equations to generate additional constraints
+    # This helps avoid putting all delay in a single shared variable
+    dstart = len(b)
+    cols = len(index_names(Ads))
+
+    # Each iteration one more column is allowed until all columns are included
+    # (and the system is stable)
+    col_lim = 15
+    di = 0
+    while True:
+        print
+        n_orig = len(b)
+
+        print('Loop %d, lim %d' % (di + 1, col_lim))
+        # Meat of the operation
+        Ads, b = derive_eq_by_row(Ads, b, col_lim=col_lim, tweak=True)
+        debug("der_rows")
+        # Run another simplify pass since new equations may have overlap with original
+        Ads, b = simplify_rows(Ads, b, corner=corner)
+        print('Derive row: %d => %d equations' % (n_orig, len(b)))
+        debug("der_rows simp")
+
+        n_orig2 = len(b)
+        # Meat of the operation
+        Ads, b = derive_eq_by_col(Ads, b)
+        debug("der_cols")
+        # Run another simplify pass since new equations may have overlap with original
+        Ads, b = simplify_rows(Ads, b, corner=corner)
+        print('Derive col %d: %d => %d equations' % (di + 1, n_orig2, len(b)))
+        debug("der_cols simp")
+
+        # Doesn't help computation, but helps debugging
+        Ads, b = sort_equations(Ads, b)
+        debug("loop done")
+        col_dist(Ads, 'derive done iter %d, lim %d' % (di, col_lim), lim=12)
+
+        rows = len(Ads)
+        # possible that a new equation was generated and taken away, but close enough
+        if n_orig == len(b) and col_lim >= cols:
+            break
+        col_lim += col_lim / 5
+        di += 1
+
+        dend = len(b)
+        print('')
+        print('Derive net: %d => %d' % (dstart, dend))
+        print('')
+        # Was experimentting to see how much the higher order columns really help
+
+    # Helps debug readability
+    Ads, b = sort_equations(Ads, b)
+    debug("final (sorted)")
+    print('')
+    print('Massage final: %d => %d rows' % (dstart, dend))
+    return Ads, b
--- a/fuzzers/007-timing/timfuz_solve.py
+++ b/fuzzers/007-timing/timfuz_solve.py
@ -0,0 +1,174 @@
+#!/usr/bin/env python3
+
+from timfuz import simplify_rows, loadc_Ads_b, index_names, A_ds2np, run_sub_json, print_eqns, Ads2bounds, instances, SimplifiedToZero, allow_zero_eqns
+from timfuz_massage import massage_equations
+import numpy as np
+import sys
+
+
+def check_feasible(A_ub, b_ub):
+    '''
+    Put large timing constants into the equations
+    See if that would solve it
+
+    Its having trouble giving me solutions as this gets bigger
+    Make a terrible baseline guess to confirm we aren't doing something bad
+    '''
+
+    sys.stdout.write('Check feasible ')
+    sys.stdout.flush()
+
+    rows = len(b_ub)
+    cols = len(A_ub[0])
+
+    progress = max(1, rows / 100)
+    '''
+    Delays should be in order of ns, so a 10 ns delay should be way above what anything should be
+    Series can have several hundred delay elements
+    Max delay in ballpark
+    '''
+    xs = [1e9 for _i in range(cols)]
+
+    # FIXME: use the correct np function to do this for me
+    # Verify bounds
+    #b_res = np.matmul(A_ub, xs)
+    #print(type(A_ub), type(xs)
+    #A_ub = np.array(A_ub)
+    #xs = np.array(xs)
+    #b_res = np.matmul(A_ub, xs)
+    def my_mul(A_ub, xs):
+        #print('cols', cols
+        #print('rows', rows
+        ret = [None] * rows
+        for row in range(rows):
+            this = 0
+            for col in range(cols):
+                this += A_ub[row][col] * xs[col]
+            ret[row] = this
+        return ret
+
+    b_res = my_mul(A_ub, xs)
+
+    # Verify bound was respected
+    for rowi, (this_b, this_b_ub) in enumerate(zip(b_res, b_ub)):
+        if rowi % progress == 0:
+            sys.stdout.write('.')
+            sys.stdout.flush()
+        if this_b >= this_b_ub or this_b > 0:
+            print(
+                '% 4d Want res % 10.1f <= % 10.1f <= 0' %
+                (rowi, this_b, this_b_ub))
+            raise Exception("Bad ")
+    print(' done')
+
+
+def filter_bounds(Ads, b, bounds, corner):
+    '''Given min variable delays, remove rows that won't constrain solution'''
+    #assert len(bounds) > 0
+
+    if 'max' in corner:
+        # Keep delays possibly larger than current bound
+        def keep(row_b, est):
+            return row_b > est
+
+        T_UNK = 0
+    elif 'min' in corner:
+        # Keep delays possibly smaller than current bound
+        def keep(row_b, est):
+            return row_b < est
+
+        T_UNK = 1e9
+    else:
+        assert 0
+
+    ret_Ads = []
+    ret_b = []
+    for row_ds, row_b in zip(Ads, b):
+        # some variables get estimated at 0
+        est = sum([bounds.get(k, T_UNK) * v for k, v in row_ds.items()])
+        # will this row potentially constrain us more?
+        if keep(row_b, est):
+            ret_Ads.append(row_ds)
+            ret_b.append(row_b)
+    return ret_Ads, ret_b
+
+
+def run(
+        fns_in,
+        corner,
+        run_corner,
+        sub_json=None,
+        sub_csv=None,
+        dedup=True,
+        massage=False,
+        outfn=None,
+        verbose=False,
+        **kwargs):
+    print('Loading data')
+    Ads, b = loadc_Ads_b(fns_in, corner, ico=True)
+
+    # Remove duplicate rows
+    # is this necessary?
+    # maybe better to just add them into the matrix directly
+    if dedup:
+        oldn = len(Ads)
+        iold = instances(Ads)
+        Ads, b = simplify_rows(Ads, b, corner=corner)
+        print('Simplify %u => %u rows' % (oldn, len(Ads)))
+        print('Simplify %u => %u instances' % (iold, instances(Ads)))
+
+    if sub_json:
+        print('Sub: %u rows' % len(Ads))
+        iold = instances(Ads)
+        names_old = index_names(Ads)
+        run_sub_json(Ads, sub_json, verbose=verbose)
+        names = index_names(Ads)
+        print("Sub: %u => %u names" % (len(names_old), len(names)))
+        print('Sub: %u => %u instances' % (iold, instances(Ads)))
+    else:
+        names = index_names(Ads)
+    '''
+    Substitution .csv
+    Special .csv containing one variable per line
+    Used primarily for multiple optimization passes, such as different algorithms or additional constraints
+    '''
+    if sub_csv:
+        Ads2, b2 = loadc_Ads_b([sub_csv], corner, ico=True)
+        bounds = Ads2bounds(Ads2, b2)
+        assert len(bounds), 'Failed to load bounds'
+        rows_old = len(Ads)
+        Ads, b = filter_bounds(Ads, b, bounds, corner)
+        print(
+            'Filter bounds: %s => %s + %s rows' %
+            (rows_old, len(Ads), len(Ads2)))
+        Ads = Ads + Ads2
+        b = b + b2
+        assert len(Ads) or allow_zero_eqns()
+        assert len(Ads) == len(b), 'Ads, b length mismatch'
+
+    if verbose:
+        print
+        print_eqns(Ads, b, verbose=verbose)
+
+        #print
+        #col_dist(A_ubd, 'final', names)
+    if massage:
+        try:
+            Ads, b = massage_equations(Ads, b, corner=corner)
+        except SimplifiedToZero:
+            if not allow_zero_eqns():
+                raise
+            print('WARNING: simplified to zero equations')
+            Ads = []
+            b = []
+
+    print('Converting to numpy...')
+    names, Anp = A_ds2np(Ads)
+    run_corner(
+        Anp,
+        np.asarray(b),
+        names,
+        corner,
+        outfn=outfn,
+        verbose=verbose,
+        **kwargs)
--- a/fuzzers/007-timing/timgrid/.gitignore
+++ b/fuzzers/007-timing/timgrid/.gitignore
@ -0,0 +1 @@
+build
--- a/fuzzers/007-timing/timgrid/Makefile
+++ b/fuzzers/007-timing/timgrid/Makefile
@ -0,0 +1,12 @@
+all: build/timgrid.json
+
+build/timgrid.txt: generate.tcl
+	mkdir -p build
+	cd build && vivado -mode batch -source ../generate.tcl
+
+build/timgrid.json: build/timgrid.txt
+	cd build && python3 ../tile_txt2json.py --speed-json ../../speed/build/speed.json timgrid.txt timgrid-s.json
+
+clean:
+	rm -rf build
+
--- a/fuzzers/007-timing/timgrid/generate.sh
+++ b/fuzzers/007-timing/timgrid/generate.sh
@ -0,0 +1,16 @@
+#!/bin/bash -x
+
+source ${XRAY_GENHEADER}
+
+vivado -mode batch -source ../generate.tcl
+
+for x in design*.bit; do
+	${XRAY_BITREAD} -F $XRAY_ROI_FRAMES -o ${x}s -z -y $x
+done
+
+for x in design_*.bits; do
+	diff -u design.bits $x | grep '^[-+]bit' > ${x%.bits}.delta
+done
+
+python3 ../generate.py design_*.delta > tilegrid.json
+
--- a/fuzzers/007-timing/timgrid/generate.tcl
+++ b/fuzzers/007-timing/timgrid/generate.tcl
@ -0,0 +1,119 @@
+proc build_project {} {
+    if 0 {
+        set grid_min_x -1
+        set grid_max_x -1
+        set grid_min_y -1
+        set grid_max_y -1
+    } {
+        set grid_min_x $::env(XRAY_ROI_GRID_X1)
+        set grid_max_x $::env(XRAY_ROI_GRID_X2)
+        set grid_min_y $::env(XRAY_ROI_GRID_Y1)
+        set grid_max_y $::env(XRAY_ROI_GRID_Y2)
+    }
+
+    create_project -force -part $::env(XRAY_PART) design design
+
+    read_verilog ../top.v
+    synth_design -top top
+
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_00) IOSTANDARD LVCMOS33" [get_ports clk]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_01) IOSTANDARD LVCMOS33" [get_ports di]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_02) IOSTANDARD LVCMOS33" [get_ports do]
+    set_property -dict "PACKAGE_PIN $::env(XRAY_PIN_03) IOSTANDARD LVCMOS33" [get_ports stb]
+
+    create_pblock roi
+    add_cells_to_pblock [get_pblocks roi] [get_cells roi]
+    resize_pblock [get_pblocks roi] -add "$::env(XRAY_ROI)"
+
+    set_property CFGBVS VCCO [current_design]
+    set_property CONFIG_VOLTAGE 3.3 [current_design]
+    set_property BITSTREAM.GENERAL.PERFRAMECRC YES [current_design]
+    set_param tcl.collectionResultDisplayLimit 0
+
+    set_property CLOCK_DEDICATED_ROUTE FALSE [get_nets clk_IBUF]
+
+    set luts [get_bels -of_objects [get_sites -of_objects [get_pblocks roi]] -filter {TYPE =~ LUT*} */A6LUT]
+    set selected_luts {}
+    set lut_index 0
+
+    # LOC one LUT (a "selected_lut") into each CLB segment configuration column (ie 50 per column)
+    # Also, if GRID_MIN/MAX is not defined, automatically create it based on used CLBs
+    # See caveat in README on automatic creation
+    foreach lut $luts {
+	    set tile [get_tile -of_objects $lut]
+	    set grid_x [get_property GRID_POINT_X $tile]
+	    set grid_y [get_property GRID_POINT_Y $tile]
+
+	    if [expr $grid_min_x < 0 || $grid_x < $grid_min_x] {set grid_min_x $grid_x}
+	    if [expr $grid_max_x < 0 || $grid_x > $grid_max_x] {set grid_max_x $grid_x}
+
+	    if [expr $grid_min_y < 0 || $grid_y < $grid_min_y] {set grid_min_y $grid_y}
+	    if [expr $grid_max_y < 0 || $grid_y > $grid_max_y] {set grid_max_y $grid_y}
+
+	    # 50 per column => 50, 100, 150, etc
+	    if [regexp "Y(0|[0-9]*[05]0)/" $lut] {
+		    set cell [get_cells roi/is[$lut_index].lut]
+		    set_property LOC [get_sites -of_objects $lut] $cell
+		    set lut_index [expr $lut_index + 1]
+		    lappend selected_luts $lut
+	    }
+    }
+
+    place_design
+    route_design
+
+    write_checkpoint -force design.dcp
+    write_bitstream -force design.bit
+}
+
+proc write_data {} {
+    if 0 {
+        set grid_min_x -1
+        set grid_max_x -1
+        set grid_min_y -1
+        set grid_max_y -1
+    } {
+        set grid_min_x $::env(XRAY_ROI_GRID_X1)
+        set grid_max_x $::env(XRAY_ROI_GRID_X2)
+        set grid_min_y $::env(XRAY_ROI_GRID_Y1)
+        set grid_max_y $::env(XRAY_ROI_GRID_Y2)
+    }
+
+    # Get all tiles in ROI, ie not just the selected LUTs
+    set tiles [get_tiles -filter "GRID_POINT_X >= $grid_min_x && GRID_POINT_X <= $grid_max_x && GRID_POINT_Y >= $grid_min_y && GRID_POINT_Y <= $grid_max_y"]
+
+    # Write tiles.txt with site metadata
+    set fp [open "timgrid.txt" w]
+    foreach tile $tiles {
+	    set type [get_property TYPE $tile]
+	    set grid_x [get_property GRID_POINT_X $tile]
+	    set grid_y [get_property GRID_POINT_Y $tile]
+
+	    set items {}
+
+        set wires [get_wires -of_objects $tile]
+	    if [llength $wires] {
+            foreach wire $wires {
+                set name [get_property NAME $wire]
+                set speed_index [get_property SPEED_INDEX $wire]
+			    lappend items wire $name $speed_index
+		    }
+	    }
+
+        set pips [get_pips -of_objects $tile]
+	    if [llength $pips] {
+            foreach pip $pips {
+                set name [get_property NAME $pip]
+                set speed_index [get_property SPEED_INDEX $pip]
+			    lappend items pip $name $speed_index
+		    }
+	    }
+
+	    puts $fp "$type $tile $grid_x $grid_y $items"
+    }
+    close $fp
+}
+
+build_project
+write_data
+
--- a/fuzzers/007-timing/timgrid/tile_txt2json.py
+++ b/fuzzers/007-timing/timgrid/tile_txt2json.py
@ -0,0 +1,80 @@
+#!/usr/bin/env python3
+
+import sys
+import os
+import time
+import json
+
+SI_NONE = 0xFFFF
+
+
+def load_speed_json(f):
+    j = json.load(f)
+    # Index speed indexes to names
+    speed_i2s = {}
+    for k, v in j['speed_model'].items():
+        i = v['speed_index']
+        if i != SI_NONE:
+            speed_i2s[i] = k
+    return j, speed_i2s
+
+
+def gen_tiles(fnin, speed_i2s):
+    for l in open(fnin):
+        # lappend items pip $name $speed_index
+        # puts $fp "$type $tile $grid_x $grid_y $items"
+        parts = l.strip().split()
+        tile_type, tile_name, grid_x, grid_y = parts[0:4]
+        grid_x, grid_y = int(grid_x), int(grid_y)
+        tuples = parts[4:]
+        assert len(tuples) % 3 == 0
+        pips = {}
+        wires = {}
+        for i in range(0, len(tuples), 3):
+            ttype, name, speed_index = tuples[i:i + 3]
+            name_local = name.split('/')[1]
+            {
+                'pip': pips,
+                'wire': wires,
+            }[ttype][name_local] = speed_i2s[int(speed_index)]
+        yield (tile_type, tile_name, grid_x, grid_y, pips, wires)
+
+
+def run(fnin, fnout, speed_json_fn, verbose=False):
+    speedj, speed_i2s = load_speed_json(open(speed_json_fn, 'r'))
+
+    tiles = {}
+    for tile in gen_tiles(fnin, speed_i2s):
+        (tile_type, tile_name, grid_x, grid_y, pips, wires) = tile
+        this_dat = {'pips': pips, 'wires': wires}
+        if tile_type not in tiles:
+            tiles[tile_type] = this_dat
+        else:
+            if tiles[tile_type] != this_dat:
+                print(tile_name, tile_type)
+                print(this_dat)
+                print(tiles[tile_type])
+                assert 0
+
+    j = {'tiles': tiles}
+    json.dump(
+        j, open(fnout, 'w'), sort_keys=True, indent=4, separators=(',', ': '))
+
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(description='Solve timing solution')
+    parser.add_argument(
+        '--speed-json',
+        default='../../speed/build/speed.json',
+        help='Provides speed index to name translation')
+    parser.add_argument('fnin', default=None, help='input tcl output .txt')
+    parser.add_argument('fnout', default=None, help='output .json')
+    args = parser.parse_args()
+
+    run(args.fnin, args.fnout, speed_json_fn=args.speed_json, verbose=False)
+
+
+if __name__ == '__main__':
+    main()
--- a/fuzzers/007-timing/timgrid/top.v
+++ b/fuzzers/007-timing/timgrid/top.v
@ -0,0 +1,49 @@
+//Need at least one LUT per frame base address we want
+`define N 100
+
+module top(input clk, stb, di, output do);
+	localparam integer DIN_N = 6;
+	localparam integer DOUT_N = `N;
+
+	reg [DIN_N-1:0] din;
+	wire [DOUT_N-1:0] dout;
+
+	reg [DIN_N-1:0] din_shr;
+	reg [DOUT_N-1:0] dout_shr;
+
+	always @(posedge clk) begin
+		din_shr <= {din_shr, di};
+		dout_shr <= {dout_shr, din_shr[DIN_N-1]};
+		if (stb) begin
+			din <= din_shr;
+			dout_shr <= dout;
+		end
+	end
+
+	assign do = dout_shr[DOUT_N-1];
+
+	roi roi (
+		.clk(clk),
+		.din(din),
+		.dout(dout)
+	);
+endmodule
+
+module roi(input clk, input [5:0] din, output [`N-1:0] dout);
+	genvar i;
+	generate
+		for (i = 0; i < `N; i = i+1) begin:is
+			LUT6 #(
+				.INIT(64'h8000_0000_0000_0001 + (i << 16))
+			) lut (
+				.I0(din[0]),
+				.I1(din[1]),
+				.I2(din[2]),
+				.I3(din[3]),
+				.I4(din[4]),
+				.I5(din[5]),
+				.O(dout[i])
+			);
+		end
+	endgenerate
+endmodule
--- a/fuzzers/007-timing/timgrid/top_bram.v
+++ b/fuzzers/007-timing/timgrid/top_bram.v
@ -0,0 +1,62 @@
+`define N 100
+
+module top(input clk, stb, di, output do);
+	localparam integer DIN_N = 8;
+	localparam integer DOUT_N = `N;
+
+	reg [DIN_N-1:0] din;
+	wire [DOUT_N-1:0] dout;
+
+	reg [DIN_N-1:0] din_shr;
+	reg [DOUT_N-1:0] dout_shr;
+
+	always @(posedge clk) begin
+		din_shr <= {din_shr, di};
+		dout_shr <= {dout_shr, din_shr[DIN_N-1]};
+		if (stb) begin
+			din <= din_shr;
+			dout_shr <= dout;
+		end
+	end
+
+	assign do = dout_shr[DOUT_N-1];
+
+	roi roi (
+		.clk(clk),
+		.din(din),
+		.dout(dout)
+	);
+endmodule
+
+module roi(input clk, input [7:0] din, output [`N-1:0] dout);
+	genvar i;
+	generate
+		for (i = 0; i < `N; i = i+1) begin:is
+            (* KEEP, DONT_TOUCH *)
+            RAMB36E1 #(.INIT_00(256'h0000000000000000000000000000000000000000000000000000000000000000)) ram (
+                    .CLKARDCLK(din[0]),
+                    .CLKBWRCLK(din[1]),
+                    .ENARDEN(din[2]),
+                    .ENBWREN(din[3]),
+                    .REGCEAREGCE(din[4]),
+                    .REGCEB(din[5]),
+                    .RSTRAMARSTRAM(din[6]),
+                    .RSTRAMB(din[7]),
+                    .RSTREGARSTREG(din[0]),
+                    .RSTREGB(din[1]),
+                    .ADDRARDADDR(din[2]),
+                    .ADDRBWRADDR(din[3]),
+                    .DIADI(din[4]),
+                    .DIBDI(din[5]),
+                    .DIPADIP(din[6]),
+                    .DIPBDIP(din[7]),
+                    .WEA(din[0]),
+                    .WEBWE(din[1]),
+                    .DOADO(dout[0]),
+                    .DOBDO(),
+                    .DOPADOP(),
+                    .DOPBDOP());
+		end
+	endgenerate
+endmodule
+
--- a/fuzzers/007-timing/timgrid_vc2v.py
+++ b/fuzzers/007-timing/timgrid_vc2v.py
@ -0,0 +1,144 @@
+#!/usr/bin/env python3
+
+import sys
+import os
+import time
+import json
+from collections import OrderedDict
+import timfuz
+
+corner_s2i = OrderedDict(
+    [
+        ('fast_max', 0),
+        ('fast_min', 1),
+        ('slow_max', 2),
+        ('slow_min', 3),
+    ])
+
+corner2minmax = {
+    'fast_max': max,
+    'fast_min': min,
+    'slow_max': max,
+    'slow_min': min,
+}
+
+
+def build_tilejo(fnins):
+    '''
+    {
+        "tiles": {
+            "BRKH_B_TERM_INT": {
+                "pips": {},
+                "wires": {
+                    "B_TERM_UTURN_INT_ER1BEG0": [
+                        null,
+                        null,
+                        93,
+                        null
+                    ],
+    '''
+
+    tilejo = {"tiles": {}}
+    for fnin in fnins:
+        tileji = json.load(open(fnin, 'r'))
+        for tilek, tilevi in tileji['tiles'].items():
+            # No previous data? Copy
+            tilevo = tilejo['tiles'].get(tilek, None)
+            if tilevo is None:
+                tilejo['tiles'][tilek] = tilevi
+            # Otherwise combine
+            else:
+
+                def process_type(etype):
+                    for pipk, pipvi in tilevi[etype].items():
+                        pipvo = tilevo[etype][pipk]
+                        for cornerk, corneri in corner_s2i.items():
+                            cornervo = pipvo[corneri]
+                            cornervi = pipvi[corneri]
+                            # no new data
+                            if cornervi is None:
+                                pass
+                            # no previous data
+                            elif cornervo is None:
+                                pipvo[corneri] = cornervi
+                            # combine
+                            else:
+                                minmax = corner2minmax[cornerk]
+                                pipvo[corneri] = minmax(cornervi, cornervo)
+
+                process_type('pips')
+                process_type('wires')
+    return tilejo
+
+
+def check_corner_minmax(tilej, verbose=False):
+    # Post processing pass looking for min/max inconsistencies
+    # Especially an issue due to complexities around under-constrained elements
+    # (ex: pivots set to 0 delay)
+    print('Checking for min/max consistency')
+    checks = 0
+    bad = 0
+
+    for tilev in tilej['tiles'].values():
+
+        def process_type(etype):
+            nonlocal checks
+            nonlocal bad
+
+            for pipk, pipv in tilev[etype].items():
+                for corner in ('slow', 'fast'):
+                    mini = corner_s2i[corner + '_min']
+                    minv = pipv[mini]
+                    maxi = corner_s2i[corner + '_max']
+                    maxv = pipv[maxi]
+                    if minv is not None and maxv is not None:
+                        checks += 1
+                        if minv > maxv:
+                            if verbose:
+                                print(
+                                    'WARNING: element %s %s min/max adjusted on corner %s'
+                                    % (etype, pipk, corner))
+                            bad += 1
+                            pipv[mini] = maxv
+                            pipv[maxi] = minv
+
+        process_type('pips')
+        process_type('wires')
+
+    print('')
+    print('minmax: %u / %u pairs bad pairs adjusted' % (bad, checks))
+    timfuz.tilej_stats(tilej)
+
+
+def check_corners_minmax(tilej, verbose=False):
+    # TODO: check fast vs slow
+    pass
+
+
+def run(fnins, fnout, verbose=False):
+    tilejo = build_tilejo(fnins)
+    check_corner_minmax(tilejo)
+    check_corners_minmax(tilejo)
+    json.dump(
+        tilejo,
+        open(fnout, 'w'),
+        sort_keys=True,
+        indent=4,
+        separators=(',', ': '))
+
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description='Combine multiple tile corners into one .json file')
+    parser.add_argument(
+        '--out', required=True, help='Combined timgrid-v.json files')
+    parser.add_argument('fnins', nargs='+', help='Input timgrid-vc.json files')
+    args = parser.parse_args()
+
+    run(args.fnins, args.out, verbose=False)
+
+
+if __name__ == '__main__':
+    main()
--- a/fuzzers/007-timing/timing_txt2csv.py
+++ b/fuzzers/007-timing/timing_txt2csv.py
@ -0,0 +1,293 @@
+#!/usr/bin/env python3
+
+from timfuz import Benchmark, A_di2ds
+import glob
+import math
+import json
+import sys
+from collections import OrderedDict
+
+# Speed index: some sort of special value
+SI_NONE = 0xFFFF
+
+# prefix to make easier to track
+# models do not overlap between PIPs and WIREs
+PREFIX_W = 'WIRE_'
+PREFIX_P = 'PIP_'
+
+
+def parse_pip(s):
+    # Entries like
+    # CLK_BUFG_REBUF_X60Y117/CLK_BUFG_REBUF.CLK_BUFG_REBUF_R_CK_GCLK0_BOT<<->>CLK_BUFG_REBUF_R_CK_GCLK0_TOP
+    # Convert to (site, type, pip_junction, pip)
+    pipstr, speed_index = s.split(':')
+    speed_index = int(speed_index)
+    site, instance = pipstr.split('/')
+    #type, pip_junction, pip = others.split('.')
+    #return (site, type, pip_junction, pip)
+    return site, instance, int(speed_index)
+
+
+def parse_node(s):
+    node, nwires = s.split(':')
+    return node, int(nwires)
+
+
+def parse_wire(s):
+    # CLBLM_R_X3Y80/CLBLM_M_D6:952
+    wirestr, speed_index = s.split(':')
+    site, instance = wirestr.split('/')
+    return site, instance, int(speed_index)
+
+
+# FIXME: these actually have a delay element
+# Probably need to put these back in
+def remove_virtual_pips(pips):
+    return pips
+    return filter(lambda pip: not re.match(r'CLBL[LM]_[LR]_', pip[0]), pips)
+
+
+def load_timing3(f, name='file'):
+    # src_bel dst_bel ico fast_max fast_min slow_max slow_min pips
+    f.readline()
+    ret = []
+    bads = 0
+    for l in f:
+        # FIXME: hack
+        if 0 and 'CLK' in l:
+            continue
+
+        l = l.strip()
+        if not l:
+            continue
+        parts = l.split(' ')
+        # FIXME: deal with these nodes
+        if len(parts) != 11:
+            bads += 1
+            continue
+        net, src_bel, dst_bel, ico, fast_max, fast_min, slow_max, slow_min, pips, nodes, wires = parts
+        pips = pips.split('|')
+        nodes = nodes.split('|')
+        wires = wires.split('|')
+        ret.append(
+            {
+                'net': net,
+                'src_bel': src_bel,
+                'dst_bel': dst_bel,
+                'ico': int(ico),
+                # ps
+                'fast_max': int(fast_max),
+                'fast_min': int(fast_min),
+                'slow_max': int(slow_max),
+                'slow_min': int(slow_min),
+                'pips': remove_virtual_pips([parse_pip(pip) for pip in pips]),
+                'nodes': [parse_node(node) for node in nodes],
+                'wires': [parse_wire(wire) for wire in wires],
+                'line': l,
+            })
+    print('  load %s: %d bad, %d good' % (name, bads, len(ret)))
+    #assert 0
+    return ret
+
+
+def load_speed_json(f):
+    j = json.load(f)
+    # Index speed indexes to names
+    speed_i2s = {}
+    for k, v in j['speed_model'].items():
+        i = v['speed_index']
+        if i != SI_NONE:
+            speed_i2s[i] = k
+    return j, speed_i2s
+
+
+# Verify the nodes and wires really do line up
+def vals2Adi_check(vals, names):
+    print('Checking')
+    for val in vals:
+        node_wires = 0
+        for _node, wiresn in val['nodes']:
+            node_wires += wiresn
+        assert node_wires == len(val['wires'])
+    print('Done')
+    assert 0
+
+
+def vals2Adi(vals, speed_i2s, name_tr={}, name_drop=[], verbose=False):
+    def pip2speed(pip):
+        _site, _name, speed_index = pip
+        return PREFIX_P + speed_i2s[speed_index]
+
+    def wire2speed(wire):
+        _site, _name, speed_index = wire
+        return PREFIX_W + speed_i2s[speed_index]
+
+    # Want this ordered
+    names = OrderedDict()
+
+    print(
+        'Creating matrix w/ tr: %d, drop: %d' % (len(name_tr), len(name_drop)))
+
+    # Take sites out entirely using handy "interconnect only" option
+    #vals = filter(lambda x: str(x).find('SLICE') >= 0, vals)
+    # Highest count while still getting valid result
+
+    # First index all of the given pip types
+    # Start out as set then convert to list to keep matrix order consistent
+    sys.stdout.write('Indexing delay elements ')
+    sys.stdout.flush()
+    progress = max(1, len(vals) / 100)
+    for vali, val in enumerate(vals):
+        if vali % progress == 0:
+            sys.stdout.write('.')
+            sys.stdout.flush()
+        odl = [(pip2speed(pip), None) for pip in val['pips']]
+        names.update(OrderedDict(odl))
+
+        odl = [(wire2speed(wire), None) for wire in val['wires']]
+        names.update(OrderedDict(odl))
+    print(' done')
+
+    # Apply transform
+    orig_names = len(names)
+    for k in (list(name_drop) + list(name_tr.keys())):
+        if k in names:
+            del names[k]
+        else:
+            print('WARNING: failed to remove %s' % k)
+    names.update(OrderedDict([(name, None) for name in name_tr.values()]))
+    print('Names tr %d => %d' % (orig_names, len(names)))
+
+    # Make unique list
+    names = list(names.keys())
+    name_s2i = {}
+    for namei, name in enumerate(names):
+        name_s2i[name] = namei
+    if verbose:
+        for name in names:
+            print('NAME: ', name)
+        for name in name_drop:
+            print('DROP: ', name)
+        for l, r in name_tr.items():
+            print('TR: %s => %s' % (l, r))
+
+    # Now create a matrix with all of these delays
+    # Each row needs len(names) elements
+    # -2 means 2 elements present, 0 means absent
+    # (could hit same pip twice)
+    print('Creating delay element matrix w/ %d names' % len(names))
+    Adi = [None for _i in range(len(vals))]
+    for vali, val in enumerate(vals):
+
+        def add_name(name):
+            if name in name_drop:
+                return
+            name = name_tr.get(name, name)
+            namei = name_s2i[name]
+            row_di[namei] = row_di.get(namei, 0) + 1
+
+        # Start with 0 occurances
+        #row = [0 for _i in range(len(names))]
+        row_di = {}
+
+        #print('pips: ', val['pips']
+        for pip in val['pips']:
+            add_name(pip2speed(pip))
+        for wire in val['wires']:
+            add_name(wire2speed(wire))
+        #A_ub.append(row)
+        Adi[vali] = row_di
+
+    return Adi, names
+
+
+# TODO: load directly as Ads
+# remove names_tr, names_drop
+def vals2Ads(vals, speed_i2s, verbose=False):
+    Adi, names = vals2Adi(vals, speed_i2s, verbose=False)
+    return A_di2ds(Adi, names)
+
+
+def load_Ads(speed_json_f, f_ins):
+
+    print('Loading data')
+
+    _speedj, speed_i2s = load_speed_json(speed_json_f)
+
+    vals = []
+    for avals in [load_timing3(f_in, name) for f_in, name in f_ins]:
+        vals.extend(avals)
+
+    Ads = vals2Ads(vals, speed_i2s)
+
+    def mkb(val):
+        return (
+            val['fast_max'], val['fast_min'], val['slow_max'], val['slow_min'])
+
+    b = [mkb(val) for val in vals]
+    ico = [val['ico'] for val in vals]
+
+    return Ads, b, ico
+
+
+def run(speed_json_f, fout, f_ins, verbose=0, corner=None):
+    Ads, bs, ico = load_Ads(speed_json_f, f_ins)
+
+    fout.write('ico,fast_max fast_min slow_max slow_min,rows...\n')
+    for row_bs, row_ds, row_ico in zip(bs, Ads, ico):
+        # like: 123 456 120 450, 1 a, 2 b
+        # first column has delay corners, followed by delay element count
+        items = [str(row_ico), ' '.join([str(x) for x in row_bs])]
+        for k, v in sorted(row_ds.items()):
+            items.append('%u %s' % (v, k))
+        fout.write(','.join(items) + '\n')
+
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description=
+        'Convert obscure timing3.txt into more readable but roughly equivilent timing3.csv'
+    )
+
+    parser.add_argument('--verbose', type=int, help='')
+    # made a bulk conversion easier...keep?
+    parser.add_argument(
+        '--auto-name', action='store_true', help='timing3.txt => timing3.csv')
+    parser.add_argument(
+        '--speed-json',
+        default='build_speed/speed.json',
+        help='Provides speed index to name translation')
+    parser.add_argument('--out', default=None, help='Output timing3.csv file')
+    parser.add_argument('fns_in', nargs='+', help='Input timing3.txt files')
+    args = parser.parse_args()
+    bench = Benchmark()
+
+    fnout = args.out
+    if fnout is None:
+        if args.auto_name:
+            assert len(args.fns_in) == 1
+            fnin = args.fns_in[0]
+            fnout = fnin.replace('.txt', '.csv')
+            assert fnout != fnin, 'Expect .txt in'
+        else:
+            # practically there are too many stray prints to make this work as expected
+            assert 0, 'File name required'
+            fnout = '/dev/stdout'
+    print("Writing to %s" % fnout)
+    fout = open(fnout, 'w')
+
+    fns_in = args.fns_in
+    if not fns_in:
+        fns_in = glob.glob('specimen_*/timing3.txt')
+
+    run(
+        speed_json_f=open(args.speed_json, 'r'),
+        fout=fout,
+        f_ins=[(open(fn_in, 'r'), fn_in) for fn_in in fns_in],
+        verbose=args.verbose)
+
+
+if __name__ == '__main__':
+    main()
--- a/minitests/timfuz/node_unique.py
+++ b/minitests/timfuz/node_unique.py
@ -0,0 +1,102 @@
+import re
+
+
+def gen_nodes(fin):
+    for l in fin:
+        lj = {}
+        l = l.strip()
+        for kvs in l.split():
+            name, value = kvs.split(':')
+            '''
+            NAME:LIOB33_SING_X0Y199/IOB_IBUF0
+
+            IS_BAD:0
+            IS_COMPLETE:1
+            IS_GND:0 IS_INPUT_PIN:1 IS_OUTPUT_PIN:0 IS_PIN:1 IS_VCC:0 
+            NUM_WIRES:2
+            PIN_WIRE:1
+            '''
+            if name in ('COST_CODE', 'SPEED_CLASS'):
+                value = int(value)
+            lj[name] = value
+
+        tile_type, xy, wname = re.match(
+            r'(.*)_(X[0-9]*Y[0-9]*)/(.*)', lj['NAME']).groups()
+        lj['tile_type'] = tile_type
+        lj['xy'] = xy
+        lj['wname'] = wname
+
+        lj['l'] = l
+
+        yield lj
+
+
+def run(node_fin, verbose=0):
+    refnodes = {}
+    nodei = 0
+    for nodei, anode in enumerate(gen_nodes(node_fin)):
+
+        def getk(anode):
+            return anode['wname']
+            #return (anode['tile_type'], anode['wname'])
+
+        if nodei % 1000 == 0:
+            print 'Check node %d' % nodei
+        # Existing node?
+        try:
+            refnode = refnodes[getk(anode)]
+        except KeyError:
+            # Set as reference
+            refnodes[getk(anode)] = anode
+            continue
+        # Verify equivilence
+        for k in (
+                'SPEED_CLASS',
+                'COST_CODE',
+                'COST_CODE_NAME',
+                'IS_BAD',
+                'IS_COMPLETE',
+                'IS_GND',
+                'IS_VCC',
+        ):
+            if k in refnode and k in anode:
+
+                def fail():
+                    print 'Mismatch on %s' % k
+                    print refnode[k], anode[k]
+                    print refnode['l']
+                    print anode['l']
+                    #assert 0
+
+                if k == 'SPEED_CLASS':
+                    # Parameters known to effect SPEED_CLASS
+                    # Verify at least one parameter is different
+                    if refnode[k] != anode[k]:
+                        for k2 in ('IS_PIN', 'IS_INPUT_PIN', 'IS_OUTPUT_PIN',
+                                   'PIN_WIRE', 'NUM_WIRES'):
+                            if refnode[k2] != anode[k2]:
+                                break
+                        else:
+                            if 0:
+                                print
+                                fail()
+                elif refnode[k] != anode[k]:
+                    print
+                    fail()
+            # A key in one but not the other?
+            elif k in refnode or k in anode:
+                assert 0
+
+
+if __name__ == '__main__':
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description=
+        'Determines which info is consistent across nodes with the same name')
+
+    parser.add_argument('--verbose', type=int, help='')
+    parser.add_argument(
+        'node_fn_in', default='/dev/stdin', nargs='?', help='Input file')
+    args = parser.parse_args()
+    run(open(args.node_fn_in, 'r'), verbose=args.verbose)
--- a/minitests/timfuz/perf_test.py
+++ b/minitests/timfuz/perf_test.py
@ -0,0 +1,143 @@
+#!/usr/bin/env python3
+'''
+Triaging tool to help understand where we need more timing coverage
+Finds correlated variables to help make better test cases
+'''
+
+from timfuz import Benchmark, Ar_di2np, loadc_Ads_b, index_names, A_ds2np, simplify_rows
+import numpy as np
+import glob
+import math
+import json
+import sympy
+from collections import OrderedDict
+from fractions import Fraction
+import random
+from sympy import Rational
+
+
+def intr(r):
+    DELTA = 0.0001
+
+    for i, x in enumerate(r):
+        if type(x) is float:
+            xi = int(x)
+            assert abs(xi - x) < DELTA
+            r[i] = xi
+
+
+def fracr(r):
+    intr(r)
+    return [Fraction(x) for x in r]
+
+
+def fracm(m):
+    return [fracr(r) for r in m]
+
+
+def symratr(r):
+    intr(r)
+    return [Rational(x) for x in r]
+
+
+def symratm(m):
+    return [symratr(r) for r in m]
+
+
+def intm(m):
+    [intr(r) for r in m]
+    return m
+
+
+def create_matrix(rows, cols):
+    ret = np.zeros((rows, cols))
+    for rowi in range(rows):
+        for coli in range(cols):
+            ret[rowi][coli] = random.randint(1, 10)
+    return ret
+
+
+def create_matrix_sparse(rows, cols):
+    ret = np.zeros((rows, cols))
+    for rowi in range(rows):
+        for coli in range(cols):
+            if random.randint(0, 5) < 1:
+                ret[rowi][coli] = random.randint(1, 10)
+    return ret
+
+
+def run(
+        rows=35,
+        cols=200,
+        verbose=False,
+        encoding='np',
+        sparse=False,
+        normalize_last=True):
+    random.seed(0)
+    if sparse:
+        mnp = create_matrix_sparse(rows, cols)
+    else:
+        mnp = create_matrix(rows, cols)
+    #print(mnp[0])
+
+    if encoding == 'fraction':
+        msym = sympy.Matrix(fracm(mnp))
+    elif encoding == 'np':
+        msym = sympy.Matrix(mnp)
+    elif encoding == 'sympy':
+        msym = sympy.Matrix(symratm(mnp))
+    # this actually produces float results
+    elif encoding == 'int':
+        msym = sympy.Matrix(intm(mnp))
+    else:
+        assert 0, 'bad encoding: %s' % encoding
+    print(type(msym[0]), str(msym[0]))
+
+    if verbose:
+        print('names')
+        print(names)
+        print('Matrix')
+        sympy.pprint(msym)
+
+    print(
+        '%s matrix, %u rows x %u cols, sparse: %s, normlast: %s' %
+        (encoding, len(mnp), len(mnp[0]), sparse, normalize_last))
+    bench = Benchmark()
+    try:
+        rref, pivots = msym.rref(normalize_last=normalize_last)
+    finally:
+        print('rref exiting after %s' % bench)
+    print(type(rref[0]), str(rref[0]))
+
+    if verbose:
+        print('Pivots')
+        sympy.pprint(pivots)
+        print('rref')
+        sympy.pprint(rref)
+
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description='Matrix solving performance tests')
+
+    parser.add_argument('--verbose', action='store_true', help='')
+    parser.add_argument('--sparse', action='store_true', help='')
+    parser.add_argument('--rows', type=int, help='')
+    parser.add_argument('--cols', type=int, help='')
+    parser.add_argument('--normalize-last', type=int, help='')
+    parser.add_argument('--encoding', default='np', help='')
+    args = parser.parse_args()
+
+    run(
+        encoding=args.encoding,
+        rows=args.rows,
+        cols=args.cols,
+        sparse=args.sparse,
+        normalize_last=bool(args.normalize_last),
+        verbose=args.verbose)
+
+
+if __name__ == '__main__':
+    main()
--- a/minitests/timfuz/pip_unique.py
+++ b/minitests/timfuz/pip_unique.py
@ -0,0 +1,86 @@
+import re
+
+
+def gen_wires(fin):
+    for l in fin:
+        lj = {}
+        l = l.strip()
+        for kvs in l.split():
+            name, value = kvs.split(':')
+            lj[name] = value
+
+        tile_type, xy, wname = re.match(
+            r'(.*)_(X[0-9]*Y[0-9]*)/(.*)', lj['NAME']).groups()
+        lj['tile_type'] = tile_type
+        lj['xy'] = xy
+        lj['wname'] = wname
+
+        lj['l'] = l
+
+        yield lj
+
+
+def run(node_fin, verbose=0):
+    refnodes = {}
+    nodei = 0
+    for nodei, anode in enumerate(gen_wires(node_fin)):
+
+        def getk(anode):
+            return anode['wname']
+            return (anode['tile_type'], anode['wname'])
+
+        if nodei % 1000 == 0:
+            print 'Check node %d' % nodei
+        # Existing node?
+        try:
+            refnode = refnodes[getk(anode)]
+        except KeyError:
+            # Set as reference
+            refnodes[getk(anode)] = anode
+            continue
+        k_invariant = (
+            'CAN_INVERT',
+            'IS_BUFFERED_2_0',
+            'IS_BUFFERED_2_1',
+            'IS_DIRECTIONAL',
+            'IS_EXCLUDED_PIP',
+            'IS_FIXED_INVERSION',
+            'IS_INVERTED',
+            'IS_PSEUDO',
+            'IS_SITE_PIP',
+            'IS_TEST_PIP',
+        )
+        k_varies = ('TILE', )
+        # Verify equivilence
+        for k in k_invariant:
+            if k in refnode and k in anode:
+
+                def fail():
+                    print 'Mismatch on %s' % k
+                    print refnode[k], anode[k]
+                    print refnode['l']
+                    print anode['l']
+                    #assert 0
+
+                if refnode[k] != anode[k]:
+                    print
+                    fail()
+            # A key in one but not the other?
+            elif k in refnode or k in anode:
+                assert 0
+            elif k not in k_varies:
+                assert 0
+
+
+if __name__ == '__main__':
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description=
+        'Determines which info is consistent across PIPs with the same name')
+
+    parser.add_argument('--verbose', type=int, help='')
+    parser.add_argument(
+        'node_fn_in', default='/dev/stdin', nargs='?', help='Input file')
+    args = parser.parse_args()
+    run(open(args.node_fn_in, 'r'), verbose=args.verbose)
--- a/minitests/timfuz/wire_unique.py
+++ b/minitests/timfuz/wire_unique.py
@ -0,0 +1,91 @@
+import re
+
+
+def gen_wires(fin):
+    for l in fin:
+        lj = {}
+        l = l.strip()
+        for kvs in l.split():
+            name, value = kvs.split(':')
+            lj[name] = value
+
+        tile_type, xy, wname = re.match(
+            r'(.*)_(X[0-9]*Y[0-9]*)/(.*)', lj['NAME']).groups()
+        lj['tile_type'] = tile_type
+        lj['xy'] = xy
+        lj['wname'] = wname
+
+        lj['l'] = l
+
+        yield lj
+
+
+def run(node_fin, verbose=0):
+    refnodes = {}
+    nodei = 0
+    for nodei, anode in enumerate(gen_wires(node_fin)):
+
+        def getk(anode):
+            return anode['wname']
+            #return (anode['tile_type'], anode['wname'])
+
+        if nodei % 1000 == 0:
+            print 'Check node %d' % nodei
+        # Existing node?
+        try:
+            refnode = refnodes[getk(anode)]
+        except KeyError:
+            # Set as reference
+            refnodes[getk(anode)] = anode
+            continue
+        k_invariant = (
+            'COST_CODE',
+            'IS_INPUT_PIN',
+            'IS_OUTPUT_PIN',
+            'IS_PART_OF_BUS',
+            'NUM_INTERSECTS',
+            'NUM_TILE_PORTS',
+            'SPEED_INDEX',
+            'TILE_PATTERN_OFFSET',
+        )
+        k_varies = (
+            'ID_IN_TILE_TYPE',
+            'IS_CONNECTED',
+            'NUM_DOWNHILL_PIPS',
+            'NUM_PIPS',
+            'NUM_UPHILL_PIPS',
+            'TILE_NAME',
+        )
+        # Verify equivilence
+        for k in k_invariant:
+            if k in refnode and k in anode:
+
+                def fail():
+                    print 'Mismatch on %s' % k
+                    print refnode[k], anode[k]
+                    print refnode['l']
+                    print anode['l']
+                    #assert 0
+
+                if refnode[k] != anode[k]:
+                    print
+                    fail()
+            # A key in one but not the other?
+            elif k in refnode or k in anode:
+                assert 0
+            elif k not in k_varies:
+                assert 0
+
+
+if __name__ == '__main__':
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description=
+        'Determines which info is consistent across wires with the same name')
+
+    parser.add_argument('--verbose', type=int, help='')
+    parser.add_argument(
+        'node_fn_in', default='/dev/stdin', nargs='?', help='Input file')
+    args = parser.parse_args()
+    run(open(args.node_fn_in, 'r'), verbose=args.verbose)
				`@ -0,0 +1,2 @@`
				`LUTs are physically laid out in an array and directly connected to a test pattern generator`