Created 2019 02 05 (markdown)

2019-02-06 01:19:34 +01:00 · 2019-02-06 01:19:34 +01:00 · 5cb5b6a2fe
parent 06724976ce
commit 5cb5b6a2fe
1 changed files with 214 additions and 0 deletions
--- a/2019-02-05.md
+++ b/2019-02-05.md
@ -0,0 +1,214 @@
+# Hierarchical merge and performance on a real testcase
+
+I'm starting the make the hierarchical processing features available to the Region class (so far, only booleans were supported). First achievement is the hierarchical merge function. "Merge" is an operation in which all connected polygons are joined into a big single one. The purpose is simplification and analysis - for example selecting polygons by area can only happen after merging. 
+
+Hierarchical merge utilizes the net formation feature and connects polygons if they touch. Each such connected cluster forms one merged polygon. The merged polygon will appear at the lowest possible level in the hierarchy at which all constituents are known.
+
+I added a small sample P&R block (thanks to Jean-Paul Chaput for allowing me to use it) into the project. This block has about 8k standard cells. The design rules are symbolic 1µm rules, but the hierarchy is realistic. 
+
+Place-and-route blocks have a hierarchical structure of standard cells, sometimes also via cells or tie-down diodes. The routine usually is within the top cell of the block and a flat arrangement of metal wires (paths, polygons) and vias rectangles (unless these are cells on their own).
+
+The layer table is:
+
+```
+1     n well
+3     n implant (source/drain)
+4     p implant (source/drain)
+5     n implant (tie-down)
+6     p implant (tie-down)
+7     poly
+10    contact
+11    metal 1
+14    via 1
+16    metal 2
+18    via 2
+19    metal 3
+21    via 3
+22    metal 4
+25    via 4
+26    metal 5
+```
+
+## Hierarchical processing code
+
+Here is some code that uses the merge feature to merge the shapes from these layout's layers. It also performs boolean operations for forming source/drain and gate areas and separating poly from diffusion contacts. Both operations happen in flat and deep mode (controlled by the $flat global variable). You can set this flag externally using
+
+``` 
+klayout -b -r this_script.rb -rd flat=1
+```
+
+All operations are repeated 10 times, because otherwise the timer resolution won't be sufficient to capture the 
+
+Here is the script:
+
+```
+dss = RBA::DeepShapeStore::new
+
+ly = RBA::Layout::new(false)
+# run from the repo's root for this file:
+ly.read("testdata/algo/vexriscv_clocked_r.oas.gz")
+
+flat = ($flat.to_i != 0)
+
+class TimedSection
+
+  def initialize(what, &block)
+
+    $stdout.write(what + " ... ")
+    $stdout.flush
+
+    timer = RBA::Timer::new
+    timer.start
+
+    # 10 times for more precise measurements
+    10.times do 
+      yield 
+    end
+    
+    timer.stop
+    
+    $stdout.write(timer.to_s + "\n")
+    $stdout.flush
+
+  end
+
+end
+
+puts "Running merge in #{flat ? %q/flat/ : %q/deep/} mode"
+
+ly.layer_indexes.sort { |a,b| ly.get_info(a).layer <=> ly.get_info(b).layer }.each do |li|
+  
+  ld = ly.get_info(li)
+  
+  lres = ly.layer(1000 + ld.layer, ld.datatype)
+  iter = ly.top_cell.begin_shapes_rec(li)
+    
+  merged = nil
+  TimedSection::new((flat ? "Flat" : "Deep") + " merge of layer " + ld.to_s) do
+    region = flat ? RBA::Region::new(iter) : RBA::Region::new(iter, dss)
+    merged = region.merged
+  end
+  
+  ly.insert(ly.top_cell.cell_index, lres, merged)
+  
+end
+
+nactive, pactive, poly, cont = [ 3, 4, 7, 10 ].collect do |layer|
+
+  iter = ly.top_cell.begin_shapes_rec(ly.layer(layer, 0))
+  flat ? RBA::Region::new(iter) : RBA::Region::new(iter, dss)
+
+end
+
+nsd = nil
+TimedSection::new((flat ? "Flat" : "Deep") + " nactive-poly ") do 
+  nsd = nactive - poly
+end
+  
+ly.insert(ly.top_cell.cell_index, ly.layer(1200, 0), nsd)
+
+psd = nil
+TimedSection::new((flat ? "Flat" : "Deep") + " pactive-poly ") do 
+  psd = pactive - poly
+end
+  
+ly.insert(ly.top_cell.cell_index, ly.layer(1201, 0), psd)
+
+ngate = nil
+TimedSection::new((flat ? "Flat" : "Deep") + " nactive&poly ") do 
+  ngate = nactive & poly
+end
+  
+ly.insert(ly.top_cell.cell_index, ly.layer(1202, 0), ngate)
+
+pgate = nil
+TimedSection::new((flat ? "Flat" : "Deep") + " pactive&poly ") do 
+  pgate = pactive & poly
+end
+  
+ly.insert(ly.top_cell.cell_index, ly.layer(1203, 0), pgate)
+
+poly_cont = nil
+TimedSection::new((flat ? "Flat" : "Deep") + " cont&poly ") do 
+  poly_cont = cont & poly
+end
+  
+ly.insert(ly.top_cell.cell_index, ly.layer(1204, 0), poly_cont)
+
+diff_cont = nil
+TimedSection::new((flat ? "Flat" : "Deep") + " cont-poly ") do 
+  diff_cont = cont - poly
+end
+  
+ly.insert(ly.top_cell.cell_index, ly.layer(1205, 0), diff_cont)
+
+ly.write("merged_#{flat ? %q/flat/ : %q/deep/}.oas.gz")
+```
+
+## Performance numbers 
+
+The performance numbers on my laptop are:
+
+Flat:
+
+```
+Flat merge of layer 1/0 ... 0.13s (sys), 11.05s (user), 11.177s (wall)
+Flat merge of layer 3/0 ... 0s (sys), 6.81s (user), 6.808s (wall)
+Flat merge of layer 4/0 ... 0s (sys), 8.9s (user), 8.899s (wall)
+Flat merge of layer 5/0 ... 0s (sys), 1.3s (user), 1.3s (wall)
+Flat merge of layer 6/0 ... 0s (sys), 1.23s (user), 1.232s (wall)
+Flat merge of layer 7/0 ... 0.15s (sys), 13.7s (user), 13.851s (wall)
+Flat merge of layer 10/0 ... 0.15s (sys), 7.92s (user), 8.076s (wall)
+Flat merge of layer 11/0 ... 0.27s (sys), 22.93s (user), 23.204s (wall)
+Flat merge of layer 14/0 ... 0s (sys), 0.82s (user), 0.821s (wall)
+Flat merge of layer 16/0 ... 0s (sys), 1.86s (user), 1.865s (wall)
+Flat merge of layer 18/0 ... 0.01s (sys), 0.84s (user), 0.848s (wall)
+Flat merge of layer 19/0 ... 0.01s (sys), 2.07s (user), 2.081s (wall)
+Flat merge of layer 21/0 ... 0s (sys), 0.06s (user), 0.062s (wall)
+Flat merge of layer 22/0 ... 0s (sys), 0.09s (user), 0.094s (wall)
+Flat merge of layer 24/0 ... 0.02s (sys), 5.43s (user), 5.446s (wall)
+Flat merge of layer 25/0 ... 0s (sys), 0.05s (user), 0.054s (wall)
+Flat merge of layer 26/0 ... 0s (sys), 0.11s (user), 0.103s (wall)
+Flat merge of layer 255/0 ... 0.02s (sys), 2.27s (user), 2.287s (wall)
+Flat nactive-poly  ... 0.46s (sys), 21.98s (user), 22.44s (wall)
+Flat pactive-poly  ... 0.48s (sys), 25.67s (user), 26.146s (wall)
+Flat nactive&poly  ... 1.02s (sys), 22.93s (user), 23.941s (wall)
+Flat pactive&poly  ... 0.58s (sys), 24.68s (user), 25.268s (wall)
+Flat cont&poly  ... 0.26s (sys), 20.98s (user), 21.237s (wall)
+Flat cont-poly  ... 0.26s (sys), 22.5s (user), 22.762s (wall)
+```
+
+Hierarchical:
+
+```
+Deep merge of layer 1/0 ... 0.01s (sys), 1.75s (user), 1.744s (wall)
+Deep merge of layer 3/0 ... 0s (sys), 0.29s (user), 0.296s (wall)
+Deep merge of layer 4/0 ... 0s (sys), 0.32s (user), 0.32s (wall)
+Deep merge of layer 5/0 ... 0s (sys), 0.25s (user), 0.253s (wall)
+Deep merge of layer 6/0 ... 0s (sys), 0.23s (user), 0.239s (wall)
+Deep merge of layer 7/0 ... 0s (sys), 0.39s (user), 0.386s (wall)
+Deep merge of layer 10/0 ... 0s (sys), 0.49s (user), 0.496s (wall)
+Deep merge of layer 11/0 ... 0.03s (sys), 12.76s (user), 12.793s (wall)
+Deep merge of layer 14/0 ... 0.02s (sys), 3.54s (user), 3.559s (wall)
+Deep merge of layer 16/0 ... 0.01s (sys), 4.56s (user), 4.583s (wall)
+Deep merge of layer 18/0 ... 0.03s (sys), 3.51s (user), 3.532s (wall)
+Deep merge of layer 19/0 ... 0.01s (sys), 3.23s (user), 3.252s (wall)
+Deep merge of layer 21/0 ... 0s (sys), 0.33s (user), 0.331s (wall)
+Deep merge of layer 22/0 ... 0s (sys), 0.33s (user), 0.338s (wall)
+Deep merge of layer 24/0 ... 0s (sys), 0.54s (user), 0.547s (wall)
+Deep merge of layer 25/0 ... 0s (sys), 0.3s (user), 0.299s (wall)
+Deep merge of layer 26/0 ... 0s (sys), 0.31s (user), 0.315s (wall)
+Deep merge of layer 255/0 ... 0.01s (sys), 6.07s (user), 6.073s (wall)
+Deep nactive-poly  ... 0s (sys), 1.26s (user), 1.275s (wall)
+Deep pactive-poly  ... 0s (sys), 1.2s (user), 1.2s (wall)
+Deep nactive&poly  ... 0s (sys), 1.18s (user), 1.183s (wall)
+Deep pactive&poly  ... 0s (sys), 1.16s (user), 1.17s (wall)
+Deep cont&poly  ... 0.01s (sys), 1.25s (user), 1.265s (wall)
+Deep cont-poly  ... 0s (sys), 1.31s (user), 1.306s (wall)
+```
+
+So overall a great improvement in the hierarchical frontend layers (factor 5 to 20 in layers 1-10), a roughly factor 2 in layer 11 (metal 1) which has both hierarchical parts in the local wiring of the standard cells, but also flat parts from the power routing and pins. There is a 2-3x disadvantage of the hierarchical implementation in the flat layers (14-26) because the hierarchical analysis demands some overhead and there is no gain for the merge computation itself.
+
+But the booleans pay this effort back with a 8x performance boost. That's because they are mainly frontend-driven and benefit from the hierarchical nature of the standard cells.
+
+