Having abandoned the attempt to redefine a split tile as two separate tile entries, I am returning to the problem of removing TT_SIDE from the database. TT_SIDE has been used to pass information to callback routines to indicate which side of a tile should be processed. This should never have been done, because it causes the database to be altered during searches, which prevents searches from being parallelized. To remove this dependency: All the basic search functions will require an additional argument which is a boolean and indicates which side of a split tile is to be processed by the callback. It is probably fine to treat the argument like "dinfo", which is to make it a TileType and set TT_DIAGONAL, TT_SIDE, and TT_DIRECTION as needed. For the basic use of callbacks, it will generally suffice to set only TT_SIDE. ----------- Both TT_SIDE and SplitSide() are used frequently. Which is why I have not done this before. First, enumerate all callback functions which use each of the above, and the search routine that will require the extra argument. Search functions or functions needing changes: (Followed by the number of occurrences) DBSrPaintArea() 401 DBTreeSrTiles() 99 DBSrPaintNMArea() 16 DBSrPaintClient() 15 SimTreeSrTiles() SelEnumPaint() GrClipTriangle() GrDrawTriangleEdge() GrDiagonal() DBTransformDiagonal() GrBox() SplitSide(): calmaMergePaintFunc() --- Comment, unused, remove. calmaWritePaintFunc() calmaMergePaintFuncZ() --- Comment, unused, remove. calmaWritePaintFuncZ() cifHierCopyFunc() cifHierErrorFunc() cmdDropPaintCell() To be fixed: STACKPUSH/STACKPOP stuff in resis/ResUtils.c resAddField will need to do something similar to extract to handle two clientData records for split tiles in ResAddPlumbing and ResRemovePlumbing. Some changes to stack handling were pretty hastily done and all should be checked for consistency. dbcUnconnectFunc() --- Need to handle side, or is this already handled? extConnFindFunc() --- Select proper region depending on side extInterSubtreeTile() --- Should handle non-Manhattan tiles. lefConnectFunc() --- write polygons to LEF? resExpandDevFunc() --- Probably needs to handle split types and replace STACKPUSH. (Lots to do in ResUtils.c) touchingTypesFunc() --- Should distinguish types between sides, and handle triangle geometry. cifInteractingRegions() --- Uses the cifSquares code, but cifSquares does not handle non-Manhattan geometry, and interacting/overlapping methods must. antennaAccumFunc() --- Needs to handle non-Manhattan geometry. extTransPerimFunc() --- Needs to set TT_SIDE based on boundary direction before calling DBSrConnectOnePlane (ExtBasic.c:3882) ResMultiPlaneFunc() --- Needs to call ResNewSDDevice() with dinfo information Any routine that does not use "dinfo" should do "if (dinfo & TT_SIDE) return 0". No, that's not needed. Ought to create a single function for simply returning "1" and replace all the assorted functions that do that individually. --------------------- First pass is done, minus issues reported above that need to be worked on. The basic issue of adding an extra argument to all the callback functions is basically completed. Moving on to compiling. . . CIFgen.c has lots of warnings around PUSHTILE(). Probably an incorrect cast, but I need to check all of the uses of PUSH and POP everywhere, anyway. (done) 1/2/2026: Compiles now! Debug. . . Errors in dbwind (missed a DBSrPaintArea() call) Need to pass dinfo to GrBox() and GrBoxOutline() Reminder: Need to check for all instances of SplitSide(), as there should not be any. There are still 38 references that need to be removed. (done) Fix: extWalkTop (Bottom, Right, Left). . . type should not have been set by the split side of tile "tp". Should be according to the side being searched and the diagonal directions of "tile" and "tp". (done) ResUtils: Cannot use STACKPUSH and STACKPOP A number of remaining uses of SplitSide() are in routines that were not fixed for dinfo or are not directly called from modified search functions, so will need to fix each one in turn. (done) Add dinfo to extSubtreeTileToNode() and extSubtreeHardNode() extNodeToTile needs to return dinfo. . . (done) extGetRegion() will need to be handled but only when I'm done with this and working on being able to attach two regions to a split tile. =============== To do: Fix the several places where the compiler spits out warnings. (1) ExtHier.c:43 use of TileType in extNodeToTile (extractInt.h:1060) (done) (2) CIFgen.c:1237, 1339: Issues with using PUSHTILE and STACKPUSH, should not cast dinfo to type ClientData; use INT2CD(dinfo). Also didn't like (TileType)STACKPOP. . . Use (TileType)CD2INT(STACKPOP(...)) (done) Okay, those are fixed. =============== Large-scale tests: (1) Ran on gf180mcu_ocd_sram_test; Running full DRC. (passed) (not exhaustive!) Also need to test: GDS output (passed) (eh, not exactly) (okay, good now) GDS input (passed) extraction (oops, segfault) (ExtHier.c:518) extresist net selection antenna checks LEF read LEF write DEF read DEF write Especially need to check the sky130 I/O where there are split tiles with both sides active. Still need to resolve the issue with attaching two net regions to a single split tile, and to return the correct region entry. Oops, reading gf180mcu_ocd_sram_top.gds.gz back after writing, then letting DRC run, crashed at some point. Probably in DRC, but unsure. Will try to repeat. Yes, missed a routine drcSubCopyErrors(). Because it's run from DBNoTreeSrTiles(). Another issue---GDS input failed to read in some non-manhattan tiles; use klayout to make sure that GDS output was correct. (yes) GDS may have just been corrected after an error was found; check GDS read again (nope). Doesn't happen with metal1, for example, only with psd (in the GF tech). Issue is that PPLUS triangles are inverted in a number of places, so output generation was messed up somewhere. Tested and found "shrink" to be the cause. DBDiagonalProc() was the cause. Its setting of TT_SIDE at the end was non-functional (DBUndo does not use it, as claimed), and the bit was getting set in the tile and disrupting any code using "TiGetTypeExact(tile) | dinfo". Maybe there should be a guard against anything setting the TT_SIDE bit in a Tile's ti_body field? Re-running run_gen_gds.sh on the SRAM test chip to see if that fixed the I/O corner cell. Well. . . Almost. cifoutput hierarchical checks generated extra non-manhattan geometry on the top level which is not *wrong* but shouldn't be there and was not there before the code changes. Found a place where "dinfo" was not handled; fixed, and retrying. . . Good. ------------------------- Next major error: PUSHTILE caused a "corrupted double-linked list (not small)" error, from ExtNghbors.c:197. Modifications to the code that shouldn't have changed anything seem to have made this pass, although can't tell yet if it works correctly. Getting another error "free(): invalid next size (fast)" error on ExtFreeLabRegions() now. . . Maybe best debugged with valgrind. . . Looks like it comes down to "extSubtreeHardNode()" being passed a split tile. Before that, extSubtreeTileToNode() on a split tile. From extHierConnectFunc2(), ha->hierOneTile is split. Split dir = 1, right type = 117 (metal 3), left type = space Typical case. . . ha->hierType looks correct. TT_SIDE set, looking at right side, which is metal3. Down in extSubtreeHardNode(), ttype = 117 (okay) extSubtreeHardUseFunc called on POWER_RAIL_COR_1_0. Then it calls ExtFindRegions on POWER_RAIL_COR_1, which created the label region, and then calls ExtLabelRegions, where tile tp is a simple metal 3 tile at (46068, 14000), "reg" is its ti_client, but reg had been freed. Therefore, ExtFindNeighbors() at extract/ExtHard.c:509 did not search all of the tiles that were originally tagged by ExtFindRegions() at extract/ExtHard.c:207. Maybe this is due to how reg->treg_tile and reg->treg_type are set? ExtFindRegions calls DBSrPaintClient() with callback extRegionAreaFunc() extRegionAreaFunc() calls ExtFindNeighbors(). ExtFindNeighbors() uses macros like PUSHTILERIGHT, etc., which could always be wrong, as could the use use of dinfo when deciding what to check and what to skip. Double-check everything in this routine. (looks okay) The most problematic case is if a region's tile is set to a split tile. This will eventually not be a problem when the regions are handled between splits. But for now, it might cause serious problems. Try breaking when this happens and see if that might be related (as it, it only happens right before magic crashes). Or. . . just rewrite some of this so that magic doesn't try to move the region's tile off of the split tile? (There was one instance of this. Changed it and re-running). Presumably was a good thing to do, but didn't change anything regarding the crash condition. Hm. But also: ExtBasic.c:4547 is doing the same thing. Changed that, still no luck. Grrr. Might be the missing treg_type in ExtLabFirst, which would need to be added; the routine does not depend on the tile type, but would still require the dinfo to be saved. And. . . Still no luck. *sob* "sublist" is related to sticky labels and is not being freed under some circumstances. I don't think this is related, but should be fixed. Huh. Maybe try a careful check between the original and new versions of each of these files? extract/ExtRegion.c extract/ExtNghbors.c extract/ExtHard.c extract/ExtSubtree.c extract/ExtHier.c If all else fails, create a routine to dump a list of tiles being set to regions, and tiles being cleared of regions, to be activated on POWER_RAIL_COR_0, so that the complete list of tiles being visted to set regions and tiles being visited to clear regions can be compared directly. Actually, this is probably more productive than looking at the file differences. Run again so that at the point of failure, can move up the call stack to find a routine where a node or def can be checked for turning the diagnostic on or off. When valgrind catches the use of freed memory, ExtLabelRegions def->cd_name is "POWER_RAIL_COR_1", although the last printed statement said that POWER_RAIL_COR_0 was being extracted; I think it's because this is a use. The routine common to both the error and the place where the memory was freed is extSubtreeHardUseFunc(). So: 1) At extSubtreeHardUseFunc, if use->cu_id is "POWER_RAIL_COR_1_0", enable the diagnostic 2) With the diagnostic enabled, list every region and every tile encountered by ExtFindNeighbors() that is connected to that region. Or maybe can get more targeted than that? Oh, no, . . . when I output diagnostics, the error doesn't occur. . . So how does LVS validation do? The diagnostic output is long. May need to redo it as two files, so that the "set" and "reset" lists can be checked side by side for any discrepancies. But if no error occurs when diagnostic output is enabled, then how can I catch the error? Well, divergent behavior *did* show up. At line 22266 of the output: It appears that ExtFindNeighbors() called from extHardFreeAll() stopped after the first tile. Tile @(46528 17492) type 0x50000075 Confirmed that this is (1) Not the first time that a split tile is the first encountered, BUT (2) This is the first time that a split tile is encountered with the active tile type on the left. Need to redo this and print the diagonal information. Not really necessary, though. Can stop printing diagnostics now and concentrate on finding what happens when ExtFindNeighbors encounters the tile at (46528, 17492) immediately after being called from extRegionAreaFunc or from FreeAll. Now can break on ExtNghbors.c:138 and 142 when tile->ti_ll.p_x == 46528 && tile->ti_ll.p_y == 17492 and see what's going on. Dinfo is 0x40000000 = TT_DIAGONAL. topside skips. leftside is run, pushes m3 (not split) tile at 42868, 17492. Dinfo is 0x70000075 --> TT_SIDE has been set here, but should not have been. Moving up, treg_type has been set to 0x70000075 for this region. Note that treg_ll is not the location of treg_tile (treg_ll = 14000, 42497). Need to find when treg_type was set inappropriately. Note that extractInt.h says that "treg_type" is the type of treg_tile, which was changed from "type of tile that contains treg_ll", which may be an indicator of the issue. . . Watch where treg_type is set in ExtBasic.c and ExtHard.c. . . We've got: ExtHard.c:91 ExtBasic.c:4610 (not relevant) Setting reg->treg_type to dinfo is missing from extTransFirst. Still not clear what's going on. The "labRegList" generated by ExtFindRegions() should be the same one as originally added in extLabFirst(). So rerun (again), break as above on ExtHard.c:91, and track when the reg->treg_type changed. Looks like "extSetNodeNum" is the culprit. The type is changed to the new tile representing the lower left-hand corner and plane. It is not immediately clear if not saving dinfo with the the lreg_ll and lreg_pnum information will cause problems, but that information should be recoverable in other ways (i.e., if the tile at point lreg_ll on lreg_pnum is split and the type at lreg_ll is not lreg_type, then the side must be changed). $$#!@ still caused a segfault. Okay, I screwed something up badly. The two processes now diverge on the very first call. Now fails at tile (42868, 14000) I do see an error, so try again. . . Ah, some light at the end of the tunnel! Maybe joy! Looks right. Removing diagnostics from the code and re-running with valgrind. . . . And now it seems to have gone into an infinite loop. But I forgot to do a "make install" and may have just caused a massive issue. Did the install and re-ran. POWER_RAIL_COR does take a long time to run, so each parent cell will be worse, so it's likely just an issue with this cell, extraction, and valgrind. Let it run to completion, and then later compare to running without valgrind. Make sure that in both cases, the final netlist result passes LVS. (Conclusion: Yes, it finished running under valgrind, eventually, and valgrind did not have any more issues. But that was a long running process and I will try to do that as little as possible.) Side note: This example has another interesting feature which is that halfway through, it goes back to the prompt, which is a known issue with "extract" but it hasn't been obvious how it happens. It should be possible to Ctrl-C out of a long-running extraction but it should *not* be possible to run commands while the extraction is ongoing. It appears to happen because only part of the design is loaded when extraction starts. When magic goes to load the rest of the design, it returns to the prompt. --------------- NOTE to self: The main thing now needing handling for extraction is to have extGetRegion(tp, dinfo) and call this appropriately everywhere. Where tp is part of a boundary record, it should be possible to derive dinfo. --------------- For now, from January 5: Back to general checks: Repeating the list from above: Need to test: GDS output (passed) GDS input (passed) extraction (pending) (fixed cap coupling issues) net selection (seems okay) antenna checks (okay) LEF read (okay) LEF write (okay) DEF read* DEF write* extresist (* knowing that there are some errors with DEF read/write that are unrelated) "def write" appears to have taken an excessively large amount of memory. This is probably not related to recent code changes but should be investigated. While the design tested is large, it is not large to the tune of 32GB+, which seems to be taken up entirely by defblockageVisit. This is unreasonable and must be fixed. Tested extraction on sky130_fd_io__top_gpiov2_flat, which crashed immediately; however, it is known that it has split tiles with different nodes and will require split nodes to be handled properly. Make sure that's the issue, though. No, actually it's extAddOverlap needing an extra argument. ---------------------------------- Running some of the I/O torture test from sky130 in ~/projects/efabless/sky130_fd_io/. These tests are important because they are part of the reason for fixing the nonmanhattan code, since the requirement of setting two regions per tile is needed for several cells in this I/O set. In lvs_tests/ First pass, running "run_top_sio.sh" resulted in magic hanging in DRCFindInteractions() while extracting "sky130_fd_io__sio_ipath_com". Here, drcSubcellFunc() is getting called alternately on uses sky130_fd_io__sio_com_m2m3_strap_5 and sky130_fd_io__sio_com_m2m3_strap_6. Given the recent work aroudn DRCFindInteractions() there is a good chance this has nothing to do with split tiles. (Confirmed) Uh oh. subUse = sky130_fd_io__sio_com_m2m3_strap_5 subUse->cu_bbox = 627, 3428 to 1485, 214751792 which sounds bogus. subUse = sky130_fd_io__sio_com_m2m3_strap_6 subUse->cu_bbox = 1615, 3248 to 2473, 214641792 which is equally bogus. But this appears to derive from the .mag files in the library. Bogus entry is in the sky130_fd_io__sio_com_m2m3_strap.mag file: "rect 164 319 214748364 321" on layer "comment". Remove this entry and correct the "box" entries in "sky130_fd_io__sio_ipath_com" to "0 0 364 858". This requires a separate investigation. I have not compiled the sky130 PDK for a while. There are some arrows drawn with comment that appear to have been mangled on GDS input. They should probably be removed from the database. However, this suggests an issue with GDS read-in. There are multiple "strap" layouts, all of which have this issue. Need to recheck the GDS read-in. Maybe just rebuild the sky130 PDK (using the previous version of magic)? That seems to have corrected the issue, which might have been caused by building the PDK with a bad version of magic. Doing "run_top_sio.sh" works now, although with the same errors as it had historically (waiting for proper handling of regions on split tiles). The "run_top_sio.sh" script now runs with surprisingly few issues. Three metal1 resistors are missing and the grounds are not cleanly separated, and very little else. "top_pwrdetv2" had been a problem but now succeeds, which is pretty significant. --------------- Split region handling: Still need to fix boundary checks: extTransPerimFunc(), extSideLeft(), etc., etc. Look for "(TileType)0" for places that need fixing. Boundaries: Should define directions for non-Manhattan tiles. b_inside = b_outside, b_segment follows the diagonal. Replace extUnInit with CLIENTDEFAULT and remove extUnInit as a global variable, as that is ridiculous. (Done, along with associated stupidity extNbrUn and also passing the value to ExtFindRegions().) Need to understand these functions better. . . For example, ignoring the coupling cap stuff for now, extOutputDevices() scans transList, sets tr_perim = 0 calls ExtFindNeighbors() from the region's tile (a tile belonging to the device) arg.fra_each = extTransTileFunc. Initial perimeter is 0. For each tile called back by ExtFindNeighbors, call extEnumTilePerim() with function extTransPerimFunc(). Anything currently with (TileType)0 or which calls simply TiGetType() needs fixing: extSideCommon(): Pass boundary and use extGetBoundaryTypes() Okay, but this still does not account for everything that needs to be done when checking coupling between non-Manhattan edges. But it should keep things from crashing or producing stupid results. Oh, no. fra_uninit is being used to process ExtFindNeighbors with a specific node like the transistor gate being considered "uninitialized". ExtNghbors.c:247 --- Need to handle separately; move "continue" down into each of the conditionals (done) ExtNghbors.c:137, 187 --- Set dinfo appropriately for top and bottom sides. (done) (may complete the handling of ExtFindNeighbors() and also properly eliminate extNbrUn as a global variable.) Whew. Is that all? (Almost certainly not.) Yes, missed code at ExtBasic.c:5203 and below. (fixed) Okay, it compiles again! Time to test again! Testing from ~/projects/efabless/sky130_lvs/ script "./run_pad_lvs_2.sh extract". Possibly magic crashed while doing the extraction. . . ? Eww, died at "extract unique notopports". ExtRegion.c:304. reg = NULL. Tile type locali in "sky130_fd_io__res75only_small", ended up with NULL ClientData; this isn't supposed to happen, is it? Seems like extHasRegion() failed. Ah, no, it's defined so that only CLIENTDEFAULT is considered "not a region". But I changed stuff around that. . . so. . . ExtSetRegion() never passed reg = 0, so it ended up as 0 some other way. Ah, this is the definition of VISITPENDING, so all tiles get ci_client = 0 on PUSHTILE. Conclusion: "ExtGetRegion(tile, dinfo) == arg->fra_region" failed for some reason. arg->fra_region is non-NULL, and ExtGetRegion returned 0, so this check should have failed. Failed at ExtNghbors.c:120. Ah---there's an improper semicolon there! But, still died. But it's further along. Died where it tried to access the split region structure. "tp" is definitely the known split tile with two regions. ExtGetRegion() has been called before a region has been assigned. Forgot to handle this case. Now deeper. . . Split tile handling is more or less correct but ExtFindRegions() never visited the gate side of the tile, and so its region was still set as 0 ("VISITPENDING"), meaning that it was reached but somehow didn't get handled by POPTILE. Looking only at cell sky130_fd_io__signal_5_sym_hv_local_5term: Breaking on "extTransFirst"---This is a double-node tile. The region's tile and type is set to this tile, but the region isn't attached to the tile here. extTransFirst sets right region of tile at 695, 1712. extTransEach (called from ExtFindNeighbors) calls extSetNodeNum() on the same tile, same side. Next tile is regular transistor non-split tile at 735, 672 Breaking on ExtSetRegion(): (1) Tile at 695, 1712 sets right region. Left type remains unvisited. (Okay) (2) Tile at 735, 672 sets only region (3) Tile at 855, 672 sets left region. (okay) (4) Tile at 855, 1712 sets left region. (okay) (5) Tile at 695, 672 sets right region. (okay) (then there is an unrelated rmetal1 device) This all looks right. . . so what happened? The tile at 695, 672 is left in transList (first entry, end of linked list) In loop at ExtBasic.c:2189, "reg" is set to this region pointer. But it still seems to be failing at ExtBasic.c:2227. This should not return 0! So now the tile at 695, 1712 has a left region set but not a right region! I think that the code wrote over the original and created a new ExtSplitRegion. . . Look at ExtSetRegion whenever the tile is the one at 695, 1712. It should be visited twice. 2nd time visited: Okay, this is the drain side. (region is poly tile at 655, 558). Except---A poly tile should be part of the gate node?? Assume not, for now. Check afterward. Houston, we have a problem. The client for this tile was reset to 0. Run again and check the client value for changes. Reset happened in DBResetTilePlaneSpecial(). From: ExtResetTiles() in ExtRegion.c:529 From: extBasic() at ExtBasic.c:236 So problem is: extFindNodes() at ExtBasic.c:279 should have marked the tile again, but didn't. See extNodeAreaFunc(). Break here on the same tile as above (at 695, 1712). It never got there? Never called extNodeAreaFunc() at all. I have done something improper with CLIENTDEFAULT there. . . ? Found a tile in which ti_client was still set to a region, so ExtResetTiles() failed to reset all tiles. . . ? Check what DBResetTilePlaneSpecial does; on plane 10, I'm seeing the tile at (121, 2271), which is the first non-space tile in the search, not having been reset. (break from ExtRegion.c:529). Argh, that doesn't match what I saw before! Start over. . . Try again with the tile at (695, 1712). Find it when it is first encountered at ExtSetRegion and track its changes thereafter. (1) break ExtSetRegion if tile->ti_ll.p_x == 695 && tile->ti_ll.p_y == 1712 (2) print &tile->ti_client (3) watch *(this value) Result: 1. ExtBasic.c:4496, sets the client pointer to an ExtSplitRegion. 2. DBResetTilePlaneSpecial() sets it back to CLIENTDEFAULT. 3. extNodeAreaFunc() sets it to VISITPENDING 4. ExtResetRegion() sets it back to CLIENTDEFAULT From ExtBasic.c:5280 Don't understand the use of ExtResetRegion() here. The "Count split tile twice" comment comes from old code. It should not do that, right? This code needs to go. See the similar code in ExtFindRegions which was already fixed correctly. Back to running run_pad_lvs_2.sh. . . But do the full pad extraction manually first. Oops. Crashed. Looks like this one died on sky130_fd_io__gpiov2_buf_localesd. That has two of the flanged gate transistors, in a different orientation. Problem freeing in DBResetTilePlaneSpecial(). But need to recompile with the correct malloc. . . And may need to go back to valgrind. DBResetTilePlaneSpecial() tried to free ti_client which was set to CLIENTDEFAULT. May be a trivial error. Lookin' good! (Note: When running run_pad_lvs_2.sh directly, reading the GDS takes a long time because it is reading cells out of order and having to continuously recycle the entire file. Do not be alarmed, but it should be investigated.) Darn. Although it worked on the magic database, it crashed on the GDS database. Not done yet! Will probably need valgrind for this one. Running ExtFreeLabRegions() on the node region list passed back from extBasic() but somewhere it ended up on a bogus entry. According to valgrind, there was an entry in nodeList that was also in transList. The transList is cleaned up by extBasic(), then magic crashes when extCellFile() cleans up nodeList and tries to free the same entry. Block allocated at extTransFirst()---> TransRegion. Then freed by ExtFreeLabRegions() at end of extBasic(). Suspiciously, this is in cell sky130_fd_io__signal_5_sym_hv_local_5term which means that it is almost certainly due to the split tile region code. With a diagnostic check, confirmed that there is an entry that is in both transList and nodeList. Need another diagnostic check to figure out how that happened. Some part of the code must be confusing the two lists? Assume this doesn't happen during "extFindNodes". In that case, ExtLabelRegions() has editable access to the nodeList. . . Pinned it down to extOutputDevices(). . . Number of node regions was axed from 9 to 6 by ExtFindNeighbors() called from ExtBasic.c:2283. It has access to the node list in extTransRec, but it should not mess with the node list. . . At least now can check only nodeList and watch for truncation. Finally pinned it down to ExtSetRegion(). Suggests that maybe ExtBasic.c:4543 ran but csr was *not* an ExtSplitRegion and something gets overwritten. . . Yes, exactly. If the clientdata is a node region, then expecting it to be a split region and setting "reg_left" will overwrite "nreg_next" and mess up the node list. Break instead on ExtBasic.c:4534. It's possible that the tile region is being set by something other than ExtSetRegion. . . (1) tile at 855, 1712: client is 0. (2) tile at 855, 672: client is 0. (3) tile at 695, 672: client is 0. (4) tile at 695, 1712: client is 0. (1-4 is unrelated def) ... (5) tile at 695, 1712: client is 0. (6) tile at 855, 672: client is 0. (7) tile at 855, 1712: client is 0. (8) tile at 695, 672: client is 0. (9) tile at 855, 1712: client is 0. (10) tile at 855, 672: client is 0. <--- Stop here and watch the ti_client space. (11) tile at 695, 672: client is 0. (12) tile at 695, 1712: client is 0. (all happened before diagnostic checks) (13) tile at 695, 1712: client is 0. (14) tile at 855, 672: client is non-zero, points to a node region. This is not what sets the region, but PUSHTILE and variants are setting the client regardless of the split status. DBconnect.c:533 also sets ti_client directly. DBconnect.c:519 also sets ti_client directly, to csa->csa_clientDefault. The call to DBSrConnectOnePlane() at ExtBasic.c:2140 may be very problematic. . . For now, changing PUSHTILE and its variants to use ExtSetRegion(), and adding a guard band to the split region structure to catch immediately if a region pointer is mistaken for a split region structure. Lost some text during a power outage, not sure why; thought I had saved all that. Anyway, the current plan is to create extEnumTerminal to replace DBSrConnectOnePlane and to use a linked list instead of depending on the tile ClientData, which is being modified by the outer loop. After which DBSrConnectOnePlane can be removed, as it is not used elsewhere. Also: extTermAPFunc() also needs to replace ti_client checks with ExtGetRegion(). All done! Time to check! Ah, well, crash and burn, time for debugging. Looks like a problem with boundary types. Boundary inside type is tile at 855, 672, dir=1 left type 29, right type 46 With dir=1, the top and right sides are the same (46) and the bottom and left sides are the same (29). "29" is the transistor type. Boundary direction is 2 = BD_TOP, meaning that the "inside" tile is below the boundary. This is wrong, because as noted above, the top of the tile, which is the side facing the boundary, is not the transistor type. Directions get set wrong somewhere. Good news is that the error is happening on a split tile with neither side being space, so this is at least broken new code, not broken old code. Well, it's improper old code. extEnumTilePerim() does not skip sides of split tiles as it should. It "calls the function on the perimeter tiles as if the whole tile is the transistor type". Fixing this is likely to result in having to handle non-Manhattan geometry in the routine that determines device width and length. "sides" is already set to the two sides of a split tile that need to be ignored. for this case, tpIn is on the bottom of the boundary, dir=1, and dinfo=0, so top and right should be ignored. But "sides" is 9, which is BD_LEFT (1) + BD_BOTTOM (8), so "sides" is marking the sides to be handled, not the sides to be ignored. (Fixed) Okay, both issues fixed but don't know what that does to W, L calculations. To be determined! I have thought about this before, though, and think that non-Manhattan edges can just be added together (subtract edges in opposite directions) and divide by 2; need to work out some examples on paper. Well, that at least extracted the gpiov2 I/O cell without crashing. Produced a lot of "Warning: Device has more terminals than defined for type" messages which will need to be investigated. These appear to be related to resistor extraction and don't seem to involve non-Manhattan geometry. There are a lot of feedback entries of the type "Cannot find the name of this node (probable extractor error)". So the width of the non-manhattan transistor is way off (0.66um, not 5.4um or something close to it). Also the drain terminal node of the device got lost and was output as "(none)". However, basically every transistor drain node was output as "(none)" so it's apparently not related to non-Manhattan geometry. Except that a handful of cells have drain nodes listed. Don't yet know what the difference is. The "(none)" drain node is in the .ext file. It has area/perimeter values. Given the amount of non-Manhattan wiring in these cells, I would hazard a guess that the problem is deriving node names from regions whose lower corner is a non-Manhattan tile. Previously if the lower corner fell on a non- Manhattan tile, it was moved. Perhaps the node name mechanism can be revised, but I think using the lower left corner position for the node name should be consistent even if the layer type isn't at the coordinate. Debugging: "(none)" names: extTransOutTerminal passed NULL for lreg. extTransRec: tr_termnode indexes are off; there are entries at 0 and 2 but not at 1. How did the indexes get off? It all happens within extOutputDevices(). Index is "termcount". On any cell that is generating "none" nodes, break on the next device, at ExtBasic.c:2192. Uh. . . Problem is that ti_client on a source/drain node tile was set to zero. . . Note this is the value of VISITPENDING. Tile is mvndiff at 5130, -790 in CellDef "sky130_fd_io__gpiov2_in_buf". Gate tile is simple "mvnmos" at 5141, -802. LB(tile) = poly at 5141, -834 (but check why client is not the same as the gate?) BL(tile) = mvndiff at 5085, -802 TR(tile) = mvndiff at 5241, -280 RT(tile) = poly at 5141, -202 Okay, but there's no null region here. . . The boundary is at (5141, -790) to (5141, -756) Yeah, okay, the S/D region tiles are broken up by contacts. The full gate is (5141, -802) to (5241, -202), so the boundary is partway up the left side of the gate. Walk up the gate left side by looking at BL(tile) then looking at RT(): (5085, -802), (5130, -790), (5085, -756), ... *only* the tile at (5130, -790) has clientdata 0. How did a tile get left with VISITPENDING set and not get visited? Something went wrong with the basic PUSHTILE/POPTILE. But I'm guessing that my new code is at fault. . . Okay, I think it's fixed now. Extracted the subcell properly. New problem now: Goes into an infinite loop on on of the cells in sky130_fd_io__top_gpiov2. Unsurprisingly in extEnumTerminal. Cell is "sky130_fd_io__gpiov2_ctl_lsbank" Device is rmetal2 at 15028, 104. Tile passed to extEnumTerminal is metal2 at 15029, 103. The problem may persist; the tile passed to extEnumTerminal() has ti_client = 0 Oops again. Getting better. Still have an issue with node names that cannot be found. These appear in various places in the .ext file, but always a "cap" or "merge" line and appear to indicate a problem finding a node in a subcell. Start with a small cell: "sky130_fd_io__com_ctl_hldv2". Note that the feedback for these errors is always on a split tile. Failing at ExtSubtree.c:1141 Break on ExtSubtree.c:1127. Error occurs first time. "tp" has dir=0, type metal1 on the right side. dinfo indicates right side. The region is set to CLIENTDEFAULT, which is the problem: The tile was not given a region. Tile at 4082, 2553. Everything around it also has CLIENTDEFAULT. This is from the cumulative extraction. . . Going into extHierConnectFunc1, sourceDef is __EXTTREE1__ and appears to be properly tagged with regions. Then it searches in __ext_cumulative which appears not to be tagged with regions. extHierConnectFunc2: Overlap area is (4082, 2553) to (4118, 2553) Abuts but does not overlap, which is correct. extConnectsTo() is TRUE, so it gets "name1" from extSubtreeTileToNode(). (doHard = TRUE). I think that no region on this tile simply means that there wasn't a label on the node. Try breaking on ExtSubtree.c:1085 if tp->ti_ll.p_x == 4082 && tp->ti_ll.p_y == 2553. Ahh, missed fixing a call to DBSrPaintNMArea. . . That would do it. . . Hey, victory! Now try LVS. . . top_gpiov2: fails but with relatively few errors (3 m1 resistors, a handful of nets). Not analyzed yet. top_power_hvc_wpadv2: Passes LVS (yay!) top_gpiov2 ESD nfet now shows W=5u in layout extraction. That is a result of the non-Manhattan tiles being ignored for width calculations. Need to see what the length and width calculation routine is doing, and how the non- Manhattan edges can be incorporated. Also check some annular FETs with corner bevels, including ones with both inside and outside bevels. Good test: Run LVS on the GF SRAM test chip top level again. Got some "no such node" errors on ext2spice, most related to VSUBS. Those errors were not there before. Occurs on a handful of SRAM core cell pairs in the 1k block. Predictably, LVS on the top level fails due to a section of unconnected substrate in the 1k SRAM block. However, there may be more going on because the VSS and DVSS nets are shorted in the layout netlist. -------------The next 350 lines are basically a digression to fix an error that has nothing to do with the new code. The VSS/DVSS mismatch comes from the corner cell. A similar (the same?) error was fixed recently. Re-running the I/O library validation script. Note: Probably the same error. My commit message says that "it is still not clear what the problem is". A workaround of collapsing an unnecessary level of hierarchy in the cell made the problem go away. Apparently, reworking the magic code has brought it back again. Ugh. The weird thing about this error is that GF_NI_COR_BASE appears to be correct and has independent VSS and DVSS nodes, but the top level shorts them; and the only thing in the top level is the GF_NI_COR_BASE subcell and a bunch of metal5 pins! Oh, wait, there is a cell POWER_RAIL_COR. And POWER_RAIL_COR_1 has the substrate contacts but no isosub; could be an issue. POWER_RAIL_COR_0 merges substrate contacts to VSUBS. POWER_RAIL_COR keeps VSUBS as a separate node. VSUBS appears to be kept separate throughout and does not actually appear to be involved in the error as far as I can tell. merge links (not necessarily direct connections): "POWER_RAIL_COR_0/VSS" "GF_NI_COR_BASE_0/power_via_cor_5_0/m2_6384_44992#" "GF_NI_COR_BASE_0/power_via_cor_5_0/m2_6358_25638#" "GF_NI_COR_BASE_0/moscap_routing_0/m1_9473_n8392#" "POWER_RAIL_COR_0/POWER_RAIL_COR_0_0/POWER_RAIL_COR_1_0/DVSS" Suspiciously, there is feedback left saying that VSS is connected to more than one unconnected node. Try "extract unique"? Is this just a very hard-to-see labeling issue? Okay, with "extract unique" I see three lines merging "VSS" with "DVSS"! All in the top level. . . But this one?: merge "DVSS_uq0" "POWER_RAIL_COR_0/VSS_uq0" In POWER_RAIL_COR_0, that's the 4th rail from the inside. There is no way to determine from the "merge" lines of the .ext file where the short happened. (ha_connHash) in ExtHier.c or ExtSubtree.c extHierConnections() Check at: ExtHier.c:213 ExtHier.c:428 ExtHier.c:533 ExtHier.c:643 For when one node belongs to VSS and the other to DVSS, at the top level. Did a special debugging string check First failed at the third set: node1 (length 3) is POWER_RAIL_COR_0/DVSS next name: GF_NI_COR_BASE_0/moscap_routing_0/m1_9481_n11541# next name: DVSS node2 (length 55) is POWER_RAIL_COR_0/VSS next name: GF_NI_COR_BASE_0/comp018green_esd_clamp_v5p0_2_0/top_route_1_0/m1_0_106# next name: GF_NI_COR_BASE_0/power_via_cor_5_0/m2_2497_25638# ha->hierOneTile vs. cum ha->hierOneTile at (69990, 58532) type metal3 cum at (70059, 58543) type via2 ------------- 2nd failure seems to be directly downstream of the first and is not worth investigating. 1st failure, node2 is already conflating the two nodes, so get a list of all the node names and try to pare it down further. Will take some work. Here are the 55 names. There is not necessarily any order to these. The string "DVSS" occurs only once in this list, and "VSS" only twice. POWER_RAIL_COR_0/VSS (VSS) GF_NI_COR_BASE_0/comp018green_esd_clamp_v5p0_2_0/top_route_1_0/m1_0_106# (DVSS) GF_NI_COR_BASE_0/power_via_cor_5_0/m2_2497_25638# (DVSS) GF_NI_COR_BASE_0/power_via_cor_5_0/m2_2497_11238# (DVSS) GF_NI_COR_BASE_0/power_via_cor_5_0/m2_2497_6432# (DVSS) GF_NI_COR_BASE_0/power_via_cor_5_0/m2_2497_3232# (DVSS) GF_NI_COR_BASE_0/power_via_cor_5_0/m2_2497_32# (DVSS) w_14068_57561# (DVSS) POWER_RAIL_COR_0/POWER_RAIL_COR_0_0/VSS (VSS) GF_NI_COR_BASE_0/power_via_cor_3_0/m2_2517_37527# (VSS) VSS (VSS) GF_NI_COR_BASE_0/power_via_cor_3_0/m2_12842_35238# (VSS) GF_NI_COR_BASE_0/power_via_cor_3_0/m2_8741_35238# (VSS) GF_NI_COR_BASE_0/comp018green_esd_clamp_v5p0_1_0/comp018green_esd_rc_v5p0_1_0/VMINUS (VSS) GF_NI_COR_BASE_0/power_via_cor_3_0/m2_6358_35238# (VSS) GF_NI_COR_BASE_0/moscap_corner_3/VMINUS (DVSS) GF_NI_COR_BASE_0/power_via_cor_3_0/m1_14757_35210# (VSS) GF_NI_COR_BASE_0/moscap_corner_5/VMINUS (DVSS) GF_NI_COR_BASE_0/moscap_corner_6/VMINUS (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/m1_n33412_n19921# (DVSS) GF_NI_COR_BASE_0/moscap_corner_2/VMINUS (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/m1_n39839_n19921# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/m1_n43259_n19921# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/a_n47022_n23957# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/m1_n27687_n31792# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/m1_n32571_n31792# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/m1_n30132_n33522# (DVSS) GF_NI_COR_BASE_0/moscap_corner_4/VMINUS (DVSS) GF_NI_COR_BASE_0/moscap_corner_2_0/VMINUS (DVSS) GF_NI_COR_BASE_0/moscap_corner_1/VMINUS (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/a_n40901_n30121# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/a_n36513_n34394# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/m1_n22219_n36968# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/m1_n27687_n36968# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/a_n34409_n36435# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/m1_n24920_n39073# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/a_n30340_n40567# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/m1_n14699_n44488# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/m1_n20058_n44488# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/a_n28236_n42608# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/m1_n17620_n45972# (DVSS) POWER_RAIL_COR_0/POWER_RAIL_COR_0_0/DVSS (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/m1_n11382_n46725# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/m1_n17715_n46725# (DVSS) GF_NI_COR_BASE_0/moscap_corner_3_0/VMINUS (DVSS) (on active plane offsides, this is the VSS border substrate tap) * POWER_RAIL_COR_0/POWER_RAIL_COR_0_0/POWER_RAIL_COR_1_0/a_13097_44708# (VSS) (next two: revised, these are treated as DVSS) ** GF_NI_COR_BASE_0/dw_13436_13361# (DVSS) ** w_13448_13361# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/m1_n11878_n47667# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/a_n24191_n46716# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/m1_n14844_n49493# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/m1_n7016_n51467# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/m1_n9452_n51467# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/m1_n11878_n51467# (DVSS) GF_NI_COR_BASE_0/moscap_routing_0/a_n20244_n50771# (DVSS) * out in space below the cell's diagonal edge ** far out in space below the cell's diagonal edge. Now figure out how many times we land on a node merge that any name from the list above show up, and then track that linked list until the wrong side gets added. 1st node to show up is "w_13448_13361#", the one that is way offsides. Connects to GF_NI_COR_BASE_0/dw_13436_13361#, the other way offsides one. Next to show up is the last of the offsides nodes. But still VSS. Need to break when node1 != node2. Next is GF_NI_COR_BASE_0/moscap_routing_0/a_n20244_n50771# and POWER_RAIL_COR_0/POWER_RAIL_COR_0_0/DVSS. Okay, caught it at the third spot again: name1 is w_13448_13361# (the way offsides one) (VSS) name2 is GF_NI_COR_BASE_0/moscap_corner_3_0/VMINUS (DVSS) Why were these merged? cum: Type "pwell" at (39696, 21496) ha->hierOneTile type "isosubstrate" at (39696, 21496) Gotta think about this one. . . Okay. Substrate generation for extraction caused this. The substrate generation must not be honoring split tiles. The isosub tile that follows the angled corner of the cell has been placed over the entire area of the tile, where it then overlaps the VSS area and shorts. But I don't see that obviously. Check dbEraseNonSub(). Not there, but did find a place where split tiles were not handled. But fixing that didn't fix the problem. Will need to track down that hierOneTile "isosub" type (type 10) If breaking at extHierConnectFunc1() when oneTile has type 10, first occurrence is x = 43896, 14405 (should probably look for split tiles, too, although the problem above appeared to be on a non-split tile). (43896, 14405) to (45298, 14498). The tile below it is split (at 43896, 13361 to 44940, 14405). Check this? (okay; see below) Then, tried breaking when oneTile is at (39696, 21496) oneTile comes from sourceDef which is __EXTTREE1__. sourceDef = oneFlat's CellDef. No clue. Break to figure out where these tiles were located. (43896, 14405) is fine and located inside the isosub area. (39696 21496) is also fine. The tile that generated the node way out in the middle of nowhere that was supposedly an isosub tile was at (13436 13361). This tile would only be found in GF_NI_COR_BASE and would be a split tile. The node position by name would be the corner outside the isosub area. As a *node name*, since it is the lowest plane, it would refer to any node connected to the area. Another thought to pursue: moscap_corner_3_0 and moscap_corner_2_0 both have bounding boxes that extend over the isosub region. I may be barking up the wrong tree here. The node name "GF_NI_COR_BASE_0/dw_13436_13361#" is probably correctly DVSS, not VSS, because it represents the isosub split tile. "w_13448_13361#", however, should be a pwell tile that was drawn for the generated substrate, and should be VSS. But the two nodes being merged are both DVSS. One of them got the wrong name. So go back to where the merge occurred and find how the name was generated. Breaking again at ExtHier.c:522 when cum location is (39696, 21496) Breaking before the call to find the name, since it's the name that doesn't match the node somehow. ha->has_nodename is extSubtreeTileToNode(). First break is in POWER_RAIL_COR_0. Not what I'm looking for. Returns DVSS. Second break is POWER_RAIL_COR, also not what I'm looking for. Also returns DVSS. Third break is top level. "cum" has ti_client CLIENTDEFAULT. In ExtSubtree.c:1085. r = (39696, 21496) to (41098, 22898). type of tile is pwell. There is no pwell drawn in the cell, so pwell has been created by the substrate generation routine. et->et_lookNames is the top level cell. extConnFindFunc() lands on tile at (13448, 13361). Split tile, area (13448, 13361) -> (44940, 44853) direction = 1, pwell on right side dinfo is the right side, so this checks out. Bottom line is that "w_13448_13361#" is the expected node name for this tile and represents DVSS. It does, however, point to an issue (unrelated to this example) that a tile with two different types neither of which is TT_SPACE would produce the same default node name for both of them, so the default node name generator needs to account for the side of a split tile in the name, with an extra character. Presumably not the problem here, though. Moving along, then: First encountered VSS at name1 = GF_NI_COR_BASE_0/power_via_cor_3_0/m2_6358_35238# name2 = POWER_RAIL_COR_0/POWER_RAIL_COR_0_0/VSS Of interest: name1 = GF_NI_COR_BASE_0/dw_13436_13361# name2 = w_14068_57561# (not seen before, but clearly inside DVSS) Note: power_via_cor_5_0 is the one inside the DVSS domain. comp018green_esd_clamp_v5p0_2_0 is the DVSS domain clamp under it. name1 = POWER_RAIL_COR_0/POWER_RAIL_COR_0_0/VSS name2 = GF_NI_COR_BASE_0/power_via_cor_3_0/m2_12842_49632# not in list above? (but it is VSS, so it's correct) (??) name1 = GF_NI_COR_BASE_0/power_via_cor_5_0/m1_14757_35210# name2 = POWER_RAIL_COR_0/POWER_RAIL_COR_0_0/POWER_RAIL_COR_1_0/a_13097_44708# (both of these are VSS; the bottom one is the "other" side of a split tile) (??) not clear these are even in a ground domain? name1 = GF_NI_COR_BASE_0/power_via_cor_5_0/m1_14757_49610# name2 = POWER_RAIL_COR_0/POWER_RAIL_COR_0_0/POWER_RAIL_COR_1_0/a_13097_44708# (both of these are VSS; the bottom one is the "other" side of a split tile) That was it, though. . . I thought this would be hard, but it's still harder than I thought. Time for a brute-force approach. Good. Listing all of the VSS entries above and checking against all of them produces only a handful of places where there were two entries and only one of them was in the list. 1st: GF_NI_COR_BASE name2 = comp018green_esd_clamp_v5p0_1_0/top_route_0/m1_6892_106# Check, but looks okay. (checked.) 2nd: POWER_RAIL_COR_0 name2 = POWER_RAIL_COR_1_0/VSS, missing from list. 3rd: POWER_RAIL_COR: name2 = POWER_RAIL_COR_0_0/VSS, missing from list. 4th: gf180mcu_ocd_io__cor: name1 = GF_NI_COR_BASE_0/power_via_cor_3_0/m1_14757_35210# name2 = POWER_RAIL_COR_0/POWER_RAIL_COR_0_0/POWER_RAIL_COR_1_0/a_13097_44708# (to be checked) 5th: gf180mcu_ocd_io__cor: name1 = GF_NI_COR_BASE_0/power_via_cor_3_0/m2_12842_49632# name2 = POWER_RAIL_COR_0/POWER_RAIL_COR_0_0/VSS (to be checked, but looks okay) 6th: gf180mcu_ocd_io__cor: name1 = GF_NI_COR_BASE_0/power_via_cor_3_0/m2_8733_41714# name2 = POWER_RAIL_COR_0/POWER_RAIL_COR_0_0/VSS (to be checked, but looks okay) 7th: gf180mcu_ocd_io__cor: name1 = GF_NI_COR_BASE_0/power_via_cor_3_0/m2_7640_52771# name2 = POWER_RAIL_COR_0/POWER_RAIL_COR_0_0/VSS (to be checked, but looks okay) MORE brute force. . . Okay, found something: This is in the 2nd location where the second name is not explicitly defined. 1st: name "comp018green_esd_clamp_v5p0_1_0/top_route_0/m1_6892_106#", looks okay. (also, not in the top level) 2nd: name1 "GF_NI_COR_BASE_0/dw_13436_13361x#" vs. name2 "POWER_RAIL_COR_0/POWER_RAIL_COR_0_0/POWER_RAIL_COR_1_0/a_13097_44708#" (check this) 3rd: node1 length 41, node2 length 6. node1 detected as dvss, node2 as vss. node2 components: POWER_RAIL_COR_0/POWER_RAIL_COR_0_0/VSS VSS GF_NI_COR_BASE_0/power_via_cor_3_0/m2_12842_35238# GF_NI_COR_BASE_0/power_via_cor_3_0/m2_8741_35238# GF_NI_COR_BASE_0/comp018green_esd_clamp_v5p0_1_0/comp018green_esd_rc_v5p0_1_0/VMINUS GF_NI_COR_BASE_0/power_via_cor_3_0/m2_6358_35238# node1 components: GF_NI_COR_BASE_0/moscap_corner_3/VMINUS GF_NI_COR_BASE_0/power_via_cor_3_0/m1_14757_35210# node1 was already compromised. power_via_cor_3 should not have ended up in the DVSS list. Check specifically for this. Spot 2, iteration 622, node1 (length 1) = GF_NI_COR_BASE_0/power_via_cor_3_0/m1_14757_35210# node2 (length 40) = GF_NI_COR_BASE_0/moscap_corner_3/VMINUS Check other node2 names: GF_NI_COR_BASE_0/moscap_routing_0/m1_n33412_n19921# (and similar) GF_NI_COR_BASE_0/moscap_routing_0/a_n47022_n23957# (and similar) POWER_RAIL_COR_0/POWER_RAIL_COR_0_0/POWER_RAIL_COR_1_0/a_13097_44708# GF_NI_COR_BASE_0/dw_13436_13361# GF_NI_COR_BASE_0/dw_13436_13361x# <-- smoking gun? Cannot be BOTH sides. "w_13448_13361# <-- is not pointing to the opposite side. . . After a diversion to correct the "goto" command for split tiles. . . node1 (length 1) = w_13448_13361# node2 (length 1) = GF_NI_COR_BASE_0/dw_13436_13361x# There lies the smoking gun. At line 690 (3rd spot) node1 corresponds to cum = pwell on right side. It should have an "x" in the name. Tile at 43803, 14403. . . . which makes it no longer a smoking gun (maybe?), although now I need to find out why the node name doesn't have an "x". How about node1 = GF_NI_COR_BASE_0/dw_13436_13361# (VSS) node2 = GF_NI_COR_BASE_0/moscap_corner_3_0/VMINUS (DVSS) ? Except dw and dwx are now already both on the same node. Still not getting it. Again? Okay, problem is that the incorrect merge wasn't checked for specifically, but can do that now. Looks like: POWER_RAIL_COR_0/POWER_RAIL_COR_0_0/POWER_RAIL_COR_1_0/a_13097_44708# (VSS) and GF_NI_COR_BASE_0/dw_13436_13361x# (DVSS) Now: node1 = GF_NI_COR_BASE_0/dw_13436_13361x# and "w_13448_13361# node2 = POWER_RAIL_COR_0/POWER_RAIL_COR_0_0/POWER_RAIL_COR_1_0/a_13097_44708# cum = split tile, dir = 1, isosub on right side, position 43803, 14403 (43803, 14403) to (43898, 14498). clearly correct for DVSS. ha->hierOneTile = split tile, dir = 1, psd on left side, pos. 43718, 14336. (43718, 14336) to (43880 14498). These appear to be overlapping. What's up? *** Okay, this is definitely the smoking gun. The two triangular regions do not overlap but the rectangles do. Who is looking at what side? dinfo, associated with cum, has TT_SIDE set, so looking at isosub, therefore DVSS. ha->hierOneTile does *not* have TT_SIDE set, so looking at psd on left side. Okay, the error is: ExtHier.c checks if the tiles touch, but does not check if two *split* tiles touch or not. First try at the disjoint triangle routine failed. Gotta do the math. Ahhh. . . Not 100% sure the code is right but I finally got a netlist that is correct. And the corner cell now passes LVS. -------------------End of digression Back to sky130 gpiov2 I/O cell. Still have to deal with the fact that the top_gpiov2 cell no longer passes LVS when it used to pass LVS, and the device count mismatches by three with metal1 resistors. At top_gpiov2 top level, check the VSSD net: In particular, most connections are the same except one nfet device, and (easier to find) sky130_fd_io__com_cclat/PU_DIS_H on the layout side at one 3.3V pFET drain/source on the layout side; the latter seems suspicious because the schematic netlist does not show any pFETs tied to ground. The issue with the gpiov2 cell looks like it might actually be relevant to the issue just fixed (or just attempted), because the PU_DIS_H line is close to ground along a diagonal, with diagonal lines from different cells overlapping. The directions of overlap are different from the example just fixed, so algorithmic errors are possible. . . Cell sky130_fd_io__com_opath_datoev2.ext merges sky130_fd_io__com_cclat_0/PU_DIS_H with VGND. NOTE: Selecting VGND in top_gpiov2 and doing "getnode" resulted in an immediate crash. Fix this first. Uh. . . Also: Selecting the VGND corner there *also* selects the PU_DIS_H wire in the same box area, which is wrong and didn't happen before. This with just a "select chunk". But the crash first. . . ExtBasic.c:960 ll is invalid. "node" is wrong. lreg->next is CLIENTDEFAULT. Came from "SimGetNodeName" so SimExtract.c probably has code that needs updating. Okay, found one. . . (fixed) Another NOTE: Haven't been able to get "getnode" to work on a large layout and this needs significant work. Maybe revisit whether code in sim/ is really necessary for that. Getting a node name should be very fast. Why isn't it? First, avoid unnecessary overhead by seeing if the same extract error happens from the .mag version. Yes, it does. And indeed, ExtHier.c:153 is reached on sky130_fd_io__com_opath_datoev2. cum = metal1 on right side of split tile at (6412, 1058) to (6461, 1107). ha->hierOneTile = metal1 on left side of split tile at (6392, 845) to (6633, 1086). Because I have clearly made bad assumptions in my routine to check if tiles overlap, I have turned to ChatGPT to tell me how to write a routine checking the overlap of two split tiles. Reworked a bunch of code around, this, and consolidated the non-Manhattan interaction test between the database code for DBSrPaintNMArea() and the extraction. Oh, dear, things seem to have been made worse. PU_DIS_H in cclat is still being connected to VSSD, but now a number of other signals are also being connected to VSSD. Debugging: extHierConnectFunc2(): Break on dbEvalCorner(). cum = metal1 on right side of split tile at (6412, 1058) to (6461, 1107). ha->hierOneTile = metal1 on left side of split tile at (6392, 845) to (6633, 1086). At least it has determined that these two tiles are facing opposite directions and need the corner evaluation. 1st call: p = (6412, 1058) r1 = (6412, 1058) to (6461, 1107) di1 = split | direction | side. r2 = (6392, 845) to (6633, 1086) di2 = split | direction in DBTestNMInteract, r (the area of overlap) is (6412, 1058) to (6461, 1086) (okay) 1st call: p = (6412, 1058) v = -473 (set vmin to this) 2nd call: p = (6461, 1086) v = -15257 (set vmin to this) 3rd call: p = (6412, 1086) v = -5849 (no action) 4th call: p = (6461, 1058) v = -9881 (no action) Okay, fixed the problem with v needing to be initialized by the first corner. . . At least that's what ChatGPT says. But it's still failing. 1st call: p = (6412, 1058) v = -473 (set vmin and vmax to this) 2nd call: p = (6461, 1086) v = -15257 (set vmin to this) 3rd call: p = (6412, 1086) v = -5849 (no action) 4th call: p = (6461, 1058) v = -9881 (no action) That worked, but. . . PU_DIS_H is still shorted to VGND in the .ext file. . . When does DBTestNMInteract() return TRUE for the hard case? Here we have area (7019, 1033) to (7044, 1058). dir = 1, side = 1. Tile t2 is at (7019, 1040) and is type metal1 on the right side, dir = 1. This routine is doing a search for "dbcUnconnectFunc()" and so is searching for non-connecting types (including TT_SPACE) overlapping the area. Tile t2 is (7019, 1040) to (7037, 1058). Now I recognize a serious problem, which is that the extract routine is looking for "interacting" shapes whereas DBSrPaintNMArea should be looking for "overlapping" shapes, and these two cases must be clearly disambiguated. (fixed) But still fails. Gave up on what ChatGPT produced and did something similar but different. Cases are still failing and connectivity always goes into an infinite loop but at least I understand the algorithm and equations. Now: Again, rect1 is (6412, 1058) to (6461, 1107). tt1 has DIR and SIDE set. t2 is at (6392, 845), DIR only. Corner evaluation: lower left = -1 upper left = -1 lower right = -1 Upper right corner is skipped. SIDE and DIR are set, so neg = 3 becomes pos = 3. Evaluates to "fully disjoint", which is correct. That was the only time this was called. . . Then maybe better to figure out the problem with net selection? Loaded sky130_fd_io__gpio_dat_ls_1v2 and selected VGND close to the bottom, m1. Got a break on dbEvalCorner. DBTestNMInteract() called with rect1 = (705, 1455) to (749, 1499), tt1 = DIR only. t2 at 705, 1455 is split m1 tile, m1 on left side, DIR=1. Got value -3, indicates fully enclosed. Is it? Yes, but the sides are the same, so why is it in dbEvalCorner? tt2 *does* have SIDE set. In which case they should be touching/nonoverlapping. Why is it even looking on that side? Yes, this is dbcUnconnectFunc. So this case fails. Easy to check. Works in the python code. What's different? Upper right check, . . . Oh. "r" still used but never defined. Selection now works, but it is now showing PU_DIS_H shorted to VSSD (when PU_DIS_H is selected, but not when VSSD is selected). If PU_DIS_H is selected on the straight wire to the right of the angled area, DBTestNMInteract() is called with rect1 = (7019, 1033) to (7044, 1058). tt1 has SIDE=1, DIR=1. t2 at (7019, 1040) has metal1 on right side, DIR=1, SIDE=0. 1st call is -1, 2nd is 0, 3rd is 0. SIDE=DIR in area, so result becomes pos=1, touch=2. pos+touch=3 so returns FALSE (unenclosed but touching). Is that right? Yes, the left side of the tile is space and the areas don't interact. Note there is also a weird case below the area where the selection. . . okay, that thing is actually real. Try again selecting the shape to the left of the area of concern. rect1 = (6378, 1161) to (6424, 1207), tt1 = DIR=1, SIDE=1 tp2 @ (6378, 1161), metal1 on left, DIR=1, SIDE=0. That's not the problem area, either. Try again with the extraction. Need to catch the nodes getting merged. That wasn't very productive. Probably better to go back to selecting PU_DIS_H and cycle through until the jump to VSSD occurs. Note: The node name of PU_DIS_H at the top level in the .mag file (because the label isn't properly connected) is "li_5797_1167#". It did show up in the extraction, being connected to ?/OUT_H_N, which layer is connected to ?cclat_0/VGND. Back to selection. When selecting the shape to the left of the area of concern, as mentioned above, here is the sequence of areas passed to DBTestNMInteract() when breaking at line 201 at the start of the "hard case": {p_x = 6378, p_y = 1161}, r_ur = {p_x = 6424, p_y = 1207} {p_x = 6377, p_y = 1160}, r_ur = {p_x = 6424, p_y = 1207} {p_x = 6378, p_y = 1161}, r_ur = {p_x = 6424, p_y = 1207} {p_x = 6358, p_y = 1107}, r_ur = {p_x = 6412, p_y = 1161} * not overlapping {p_x = 6412, p_y = 1058}, r_ur = {p_x = 6462, p_y = 1108} * overlapping {p_x = 6424, p_y = 1107}, r_ur = {p_x = 6478, p_y = 1161} {p_x = 6412, p_y = 1058}, r_ur = {p_x = 6461, p_y = 1107} * overlapping {p_x = 82, p_y = 1058}, r_ur = {p_x = 132, p_y = 1108} (?) ... Take the starred entries in turn: Area 6358 1107 6412 1161 DIR=1 SIDE=1. t2@(6358, 1107) DIR=1 SIDE=0 metal1 on right check is on space side of tile, so result should be disjoint (false) pos = 1, touch = 2, returns false (correct) Area 6412 1058 6462 1108 DIR=1 SIDE=1. t2@(6424, 1107) DIR=1 SIDE=0 metal1 on left check is on metal side of tile, tile is in the same net, should connect (true) neg = 3, returns true (correct, I think) Area 6412 1058 6461 1107 DIR=1 SIDE=1. t2@(6412, 1058) DIR=1 SIDE=0 metal1 on right check is on space side of tile. pos = 1, touch = 2 returns false. Seems like it is checking the wrong thing, like t2. Actually the next entry is the interesting one. The position (82, 1058) is the location in subcell sky130_fd_io__com_cclat of the overlapping tile from the parent cell. This is probably what needs inspecting. Area 82 1058 132 1108 DIR=1 SIDE=1. t2@(62, 845) DIR=1 SIDE=0 metal1 on left side. This is clearly checking the wrong side; maybe tt1 is not being adjusted for the child use transform? Or is it suspicious that DIR and SIDE are always 1? Going up the call stack, the tile is indeed in sky130_fd_io__com_cclat. dinfo *has* been transformed to scx->trans. dinfo = DIR=1, SIDE=1. new dinfo = DIR=1, SIDE=1. Trans = [1 0 6330 0 1 0] which is just a translation (correct). Going up further in the stack: DBTreeCopyConnect() calls DBTreeSrNMTiles with scx->scx_area == (6412, 1058) to (6462, 1108), dinfo = DIR=1, SIDE=1 (confirmed) In the child cell, this search becomes area (82, 1058) to (132, 1108) (confirmed) with the same split information (DIR=1, SIDE=1). "mask" contains only metal1. Calls DBTestNMInteract to check this area againsst the tila at (62, 845) which is a split tile with metal on the left side. This check should go to the hard case (correct) but should return a result of "disjoint". rect2 = (62, 845) to (303, 1086). Should evaluate lower left, upper left, and lower right. lower left: result -1 upper right is skipped (correct) upper left: result -1 lower right: result -1 returns FALSE (fully disjoint) (correct). Okay, so that wasn't a smoking gun. Keep looking. When does it jump to the VSSD region? (locations after area 82 1058 132 1108 above) 6412 1058 6461 1107: Overlap region again 7019 1033 7044 1058: Right side of PU_DIS_H (corner fillet) 7019 1033 7044 1058: again 7019 1033 7044 1058: and again 6412 1058 6461 1107: Back to the region of interest overlap 6478 1086 6499 1107: Position facing inward 6391 844 6633 1086: This is in VSSD. What is it doing here? Something happened in the previous three checks? They didn't look interesting. . . Check where the t2 tile is in each case. Uhh. . . Back to Area 82 1058 132 1108, tile is @(62, 845) confirmed above that result is disjoint. I'm thinking that something is logically incorrect before it gets to line 201? Start looking at *all* calls to DBTestNMInteract(), find a t2 that belongs to VSSD, and then find when DBTestNMInteract returns TRUE. Here's a new one: area 147 1085 169 1107 checks against tile (62 845) in cclat returned true (but should not have). Oof---at line 190, returned "true" Tiles have the same TT_SIDE but not interact. In this case, no part of the *rectangle* of t2 is in the area of rect1. So it should have returned FALSE from the code above, but didn't. No, that's not right. Here, t2 is the larger rectangle, so t2 definitely overlaps rect1. Maybe it's sufficient to just remove that line? Won't the rest of the code determine the interaction correctly, regardless of how TT_SIDE is set? Let's try it! Didn't work. . . maybe the pos/neg swap is wrong in this case? (Note: Found this by breaking on DBTestNMInteract when t2->ti_ll.p_x == 62 && t2->ti_ll.p_y == 845 2nd time was the one that failed. Keep forgetting that t2 is the larger rectangle. Implemented something but assumptions still seem to be wrong. Actually, just missing parentheses. Finally, it's working!!! Still errors on top_gpiov2 but VSSD now matches, so that part of the extraction was correct. Investigating the remaining top_gpiov2 errors: Mismatch of three rm1 resistors may be coming from sky130_fd_io__gpiov2_octl_dat. Can try a "noflatten" on that cell and see. My have another digression because netgen crashed. (Fixed, needs attention later) Worse, the GDS-derived netlist and the .mag-derived netlist differ, with netgen reporting not only the three metal 1 resistors, but now also a metal 3 resistor. Check sky130_fd_io__sio_signal_5_sym_hv_local_5term? I think there may be a discrepency between the use of this in the ovtv2 pad and the gpiov2 pad. The netlist "incorrectly" uses the ovtv2 version of buf_localesd in the gpiov2 pad. sky130_fd_io__gpiov2_octl_dat layout has a split DM_H[1] pin (confirm?). This prevents correct LVS, but if it is flattened, then additional rm1 devices appear. Need unrelated work: "ad" and "pd" values (usually) match relative to what magic used to produce, but "as" and "ps" are way off. Something's up with that. Just noticed that res_generic_m1 devices are being removed, counting as "zero valued" resistors. But they aren't zero-valued; they're semiconductor resistors, and that should never happen and is netgen's fault. And I need regression tests. . . According to netgen code, that should not have happened. Indeed, error got introduced accidentally into netgen last month. Fixed. That seems to have gotten the top_gpiov2 cell back into striking range of LVS clean. The most obvious issue left is that ENABLE_H is connected to 1 each 3.3V nFET and pFET devices in the layout but 3 each in the netlist. Layout shows definitely connected to 6 devices. Extraction error? Layout also shows that sky130_fd_io__gpiov2_ctl has all of the transistors but they're broken into two unconnected groups. The labeled group of four is getting disconnected, and this is almost certainly an introduced extraction error (dang.). Name of disconnected part of the net is a_11799_3638# in sky130_fd_io__gpiov2_ctl. This is correctly listed as a port of gpiov2_ctl after DM[1], the 19th port. However, that's the part that is found. The part of the net labeled ENABLE_H is the 6th port. In sky130_fd_io__top_gpiov2, the call to sky130_fd_io__gpiov2_ctl has as the 6th port "sky130_fd_io__gpiov2_ctl_0/ENABLE_H" and as 19th port ENABLE_H, so this is unfortunately a merge that failed to happen. Re-extract and inspect the .ext files. (Although technically, need to follow the GDS import from the script to make a 1:1 comparison.) (Both files are the same, so it's okay to use the .mag for extraction testing.). ctl_0/ENABLE_H does not even appear in the .ext file. Will have to track down how the net connectivity of ENABLE_H is traced in the extraction. How could it miss just one connection of one net? If I punt on this one by correcting it manually in the netlist, then there is still a failure in which "sky130_fd_io__top_gpiov2_0/\ sky130_fd_io__gpio_opathv2_0/sky130_fd_io__gpio_odrvrv2_0/PU_H_N[0]" has also been disconnected. But this follows a ridiculously long and winding path, and looks like it has a good chance of being produced by the same bug, and I'd rather debug from the ENABLE_H signal. Also it's harder to figure out what to change in the netlist to correct it manually. Most likely involves a plane-to-plane connection through a via? Problem should be found around extHierConnections(). However, since the problem is a *failure* to merge two nodes, watching the node merge function isn't going to be much help. Will need to track specific locations. Top cell will be sky130_fd_io__top_gpiov2. ENABLE_H starts with the pin on metal2 at at (7092, 0) to (7144, 402). Since the connection to the right is found, then any disconnect happens above. There is an angled wire that starts at y=402. The connection between the two nodes happens in the area of (7028, 672) to (7078, 782). The connecting cell underneath is sky130_fd_io__gpiov2_ctl. The same connecting area in the coordinates of the child cell are around (10536, 4166) to (10584, 4280). The child and parent are in the same oriention, with X and Y offsets. However, the connection passes only a brief distance before descending again to cell sky130_fd_io__com_ctl_hldv2, which is not rotated 180 degrees. It should be able to be determined from the .ext files if the lower connection was made. The node name in sky130_fd_io__com_ctl_hldv2 is a_2671_3554#. The node name in the parent is ENABLE_H, but only through the child connection. The connecting m1 piece should have a name m1_10103_4165#. This might be the likely point of failure, because the metal line has a split tile at the lower left corner which has the unusal property of coming to a point at the lower left, because of the horrid layout job done on these cells. Conclusion: The sky130_fd_io__gpiov2_ctl.ext file does connect ENABLE_H to both m1_10103_4165# and to sky130_fd_io__com_ctl_hldv2_0/a_2671_3554#. Clue! sky130_fd_io__gpiov2_ctl DOES make the connection, but to "sky130_fd_io__gpiov2_ctl_0/m1_10103_4165x#". Therefore, the error is that the "x" is not accounting for the subcell rotation (or something like that). No, I don't think it's rotation, and the code is very clear that "x" is added when the type is on the right side, and that is indicated by lreg->lreg_type. There is no rotation going on here between sky130_fd_io__top_gpiov2 and sky130_fd_io__gpiov2_ctl. That means that TT_SIDE got set for lreg_type incorrectly Should be easily checkable, as it indicates that extSetNodeNum() was called with tile->ti_ll = (10103, 4165) and dinfo & TT_SIDE != 0. (Should probably have started when running extraction on the top cell, because the conditional severely impacts extraction time. But did confirm that extSetNodeNum() is called properly within sky130_fd_io__gpiov2_ctl with dinfo only set to zero.) Did not happen. There is a fundamental issue: The LabRegion structure doesn't even have a tile type. "lreg_type" is being used to track the dinfo for "treg_tile", which is probably stupid. "lreg_type" upper bits should ether follow the type of the tile at "lreg_ll" or else flag when "lreg_ll" is the right side of the node (thus triggering the "x" in the name, or maybe change that to "r" for "right"). The treg_tile dinfo should be an extra field in the TransRegion type. Not entirely sure it's necessary to have an extra field. If treg_type is just the type of treg_tile, then treg_type is not needed other than to store TT_SIDE. May be okay just to always reassign TT_SIDE when reassigning lreg_ll and lreg_type to a new tile position. Okay, that's probably an incorrect assumption given that "(none)" nodes started appearing in the output. Went back and did the other idea, which is to add a record to TransRegion for "dinfo" and leave treg_type with TT_SIDE corresponding to the region lreg_ll position, not necessarily treg_tile. Seemed to work. Now there is one LVS error which is the netlist issue with gpiov2 vs. gpio_ovtv2 for the connection to buf_localesd, a change which I reverted because I wasn't sure if it was needed or not. Applying the change again. . . And voila! gpiov2 is LVS clean again! (Except for the ESD transistor width calculation, which now needs to be modified to properly handle non-Manhattan edges of the device.) Now to work on the newer I/O cells gpio_ovtv2, sio, etc. Quick pass at both gpio_ovtv2 and sio result in netlists that do not appear to be too far from clean; as in just a few shorts/opens. Specifically: 1. For gpio_ovtv2, it looks like two nets in the netlist are shorted to the pad, and those are the only errors. There is one component mismatch, but that appears to be due to the shorted nets. There was at one time a hack solution for gpio_ovtv2 to remove the flanged gates from the layout; this might still be being done in the open_pdks install. Need to check. Should be resolved now. 2. For sio (not macro), vssd and vssio are shorted in the layout. There seem to be a few other shorts/opens elsewhere. 3. The gpio_ovtv2 cell LVS from the existing PDK .mag view still requires the flattening-in-place of the amux_i2c_fix cell, or else the ground nets get shorted. 4. run_top_pwrdetv2.sh is LVS clean. 5. run_top_analog_pad.sh is LVS clean. 6. run_top_refgen_new.sh has an LVS match but a property error (which is not new and apparently has not been fixed) Digression (but important bug): source and drain calculations are off. Test: The default nMOS device from the device generator. Source and drain are exactly symmetric. .ext file output has correct length of the terminals (84, same as device gate width), but "2088,101" for drain area/perimeter and "2436,142" for source area/perimeter. Actual area and perimeter for both are: A=4872, P=284. "Source" has exactly half of these values---Suggests it was being "shared" with another non-existing device. Where it got the drain values from is unclear. Since the area/drain searching algorithm is the one that got rewritten, it is probably at fault. Routine is now "extEnumTerminal()"; break on that. 1st call: Tile at -73, -42 (type ndiff) 2nd call: Tile at 15, 30 (type ndiff). There is a contact on each side and the terminal region for each side should be 5 tiles total. Tile at -73, -42 is tile below the contact on the left side. Tile at 15, 30 is the tile above the contact on the right side. (1) tileArea = -73, -42 to -15, -30 (2) eapd_area = 696 eapd_perim = 82 (added 6 lengths) (3) added shared device (self) (4) continue search, added tile above (-27, -30) to linked list. (5) continue search, added tile above (-61, -30) to linked list. (6) went to termdone. . . missed a tile. Two sides are using the wrong comparison. Drain and source match now. But values are still cut in half. The "addShared" subroutine only ever sees one gate node, so nothing is shared. It is starting "shared" at one, but there is always one record, so minimum "shared" is coming out 2---Has it always been this way??? Anyway, it's correct now. --------------- Back to study of gpio_ovtv2 and sio. gpio_ovtv2: From magic, no flattening in place. Results in ground short. gpio_ovtv2: From magic, use flatten-in-place. Issue is a short between PAD and two other nets which in the netlist are: 1) sky130_fd_io__gpio_ovtv2_opath_i2c_fix_leak_fix:opath_q0/\ sky130_fd_io__gpio_ovtv2_odrvr_i2c_fix_leak_fix:odrvr_q0/p2g 2) sky130_fd_io__gpio_ovtv2_opath_i2c_fix_leak_fix:opath_q0/\ sky130_fd_io__gpio_ovtv2_odrvr_i2c_fix_leak_fix:odrvr_q0/net74 The schematic has the esd_nfets and esd_pfets divided into three nets. The division looks pretty clear from the layout; one esd_nfet is offset from the others; the three esd_pfets on the left and right sides are connected differently. However, at issue is what happens to the esd_pfet gates. The layout-extracted netlist has the esd_pfet gates shorted and tied to pad. This does not match the connectivity selection view. This had been a problem before and I was hoping it would have been resolved by the new code. At issue are numerous flanged gates on the ESD pFETs. The "fix_leak_fix" cell, which is where the errors are located, is "nearly" matching with the layout having an isolated and unmarked section of VDDIO. "p2g" net has two rm1 resistor ends, an ESD pfet gate (may be multiple devices in parallel), and one each 5V pFET and 5V nFET source/drain. This looks like it is the gates of the 5 pairs of ESD pfets on the top row except for the rightmost pair (which is "net74", see below). If so, then the node is a_7678_12770#. "net74" has one rm1 resistor end, one rm2 resistor end, one ESD pfet gate, and one poly resistor end. Any such net in the layout? Looks like maybe the gates of the ESD pfet pair at top right. If so, this would be node a_20646_12770# in sky130_fd_io__gpio_ovtv2_opath_i2c_fix_leak_fix. Therefore, find in the .ext files where a_7678_12770# is merged to PAD and where a_20646_12770# is merged to PAD, or where they are merged to each other. Extraction doesn't match connectivity. Connectivity selection is correct. If I recall correctly from before, there was an issue with the flanged gates shorting across gate to source, or rather the node region being confused between the two. This should not happen any more, but obviously just did. The two nodes cited above do not appear in the top level .ext file. "getnode" is producing garbled crap for the prefixes in the net names (need to fix!) but at the top level, the suffixes on the nets are showing as: "net74": a_21780_5642# "p2g": m1_3189_4326# These two names are not found at the top level, either. The "merge" lines for PAD are found in sky130_fd_io__top_gpio_ovtv2.ext at lines 1468 to 1497. There is a clear issue although it doesn't appear to be exactly related to the problem at hand; the PAD net is merging to nodes which are showing as being the gate side of a split tile while the PAD net is the source/drain side. A survey shows that "x#" shows up exactly nowhere in the .ext output. So this might be relatively simple! But it implies that lreg_type never has TT_SIDE set. Why not? Ech, changing something that was supposedly fixed already, so will have to re-check other layouts involving split regions. After fixing (again), nothing changed. . . Really need to fix "getnode", this is interfering with debugging. (Specifically, SimGetnode()). This could be a malloc issue? SimSeletArea() returns TileListElt pointer. . . Where does this list get deallocated?? SimFreeNodeList(); bleah, it's a global variable and the linked list just hangs around until the routine is called again. Seems like this needs work and has not yet been addressed. See SimTreeCopyConnect(). . . At SimDBstuff.c:132, tpath->tp_last is the garbled part. tp_first has the path that should be returned but somehow the garbled part is what's getting returned. tp_last is a sentinel pointer at the end of the pathName string, and it should not be in the output as far as I can tell. SimGetNodeName() somewhere returned a bogus result. SimGetNodeName() was called from SimDBstuff.c:136 with bogus tpath->tp_first from cx->tc_filter->tf_tpath cx deifned at SimDBstuff.c:766 and tc_filter set to fp (= cdata). fp is itself a TreeFilter pointer. and fp->tf_arg is, too? fp is cxp->tc_filter. cxp is a contact which took tf_arg from cdarg cdarg set at SimDBstuff.c:820 from "fp" fp from cdata passed to SimCellTileSrFunc So ultimately, tpath came from SimDBstuff.c:399. Maybe watch tpath.tp_first changes starting from SimDBstuff.c:366 It changed to a bogus value in SimCellTileSrFunc() but that specific location appears to have been deallocated. Try valgrind again. . . Valgrind comes through. . . Looks like csa->csa_clientDefault needs to change to match other things I did with "clientDefault" in the DBconnect routines. . . Fixed. . . But still no joy. Might be faster to parse DBconnect against SimDBstuff? Did that, nothing obvious. Now valgrind shows no error but the tpath prefix is missing. Produced garbage in prefix but not when running under valgrind. Don't know why. Ugh. Will need investigating. Doesn't even address the current problem with extraction. "getnode" has issues, but maybe can debug the problem with "goto" only? The shorted nodes seem not to be in the merge list at all. . . Assuming that the .spice file is wrong, then there are two conflicting names in the .ext file somewhere such that nodes are effectively shorted without a "merge" statement. Could possibly happen between sky130_fd_io__gpio_ovtv2_opath_i2c_fix_leak_fix and sky130_fd_io__gpio_ovtv2_odrvr_i2c_fix_leak_fix? Actually, this is likely, given the issues with "getnode". Could be in sky130_fd_io__gpio_ovtv2_odrvr_i2c_fix_leak_fix itself, but for example node a_19583_5855# has only 2 merge lines: sky130_fd_io__gpio_ovtv2_odrvr_sub_leak_fix_0/sky130_fd_io__gpio_ovtv2_pudrvr_sub_0/\ sky130_fd_io__gpio_ovtv2_pudrvr_strong_0/a_21780_5642# sky130_fd_io__gpio_ovtv2_odrvr_sub_leak_fix_0/m1_24520_7190# a_19583_5855# Oh, sky130_fd_io__gpio_ovtv2_opath_i2c_fix_leak_fix.ext is a mess. . . There is a "(none)" node in this set, among 39 "merge" entries. Also there are FETs drawn on top of FETs here. Will at least produce device size errors. Yes, the node shorts are here. Debugging: The "(none)" is returned for tp = tile at 4436, 12770, mvpmosesd on left side, mvpdiff on right side, dir=0. dinfo indicates looking at the right side. Breaking on extSubtreeHardNode for this tile. lreg for this tile has ll=4436, 12488 and lreg_type has TT_SIDE. extSubtreeHardSearch() did nothing. Now sets "autogen" and retries. ExtFindRegions() produced nothing. def is sky130_fd_io__gpio_ovtv2_odrvr_i2c_fix_leak_fix scx_area is 3423, 6287 to 3443, 6307. scx_area is 3574 3773 to 3594 3793 def is sky130_fd_io__gpio_ovtv2_pudrvr_sub scx_area is 579 2474 to 599 2494 def is sky130_fd_io__gpio_ovtv2_hotswap_vpb_bias scx_area is 2722 9349 to 2742 9369 ... go back and check what's in the first search area. It's the flanged gate of the leftmost of the ESD pFETs. The area is an exact match to the split tile, so if it found nothing it would have to be because the wrong side of the tile was being searched, or the split was mishandled somehow. Do that again, break on ExtSubtree.c:1255 if tp->ti_ll.p_x == 4436 && tp->ti_ll.p_y == 12770. So dinfo has TT_SIDE set to 1, so it is ostensibly looking at the right side of the tile where the tile has pdiff (not transistor). extHierConnections() called extHierConnectFunc1() (ha_subArea = 4335 12350 5737 13752) From here, it is looking for connections to the right side of the tile, so that seems correct insofar as it goes. This is at one point hwere mvpmos is on top of mvpmosesd. This could account for the (none) type since these types do not necessarily connect. May be a red herring. Will instead need to do the same thing as done once previously, which is to watch the node merges. Divide the (long) list of nodes into the three expected nets. The three nets are: PAD (1), the ESD pFET gates on the right (2), and the ESD pFET gates on the left (3). sky130_fd_io__gpio_ovtv2_odrvr_i2c_fix_leak_fix_0/a_19583_5855# (2) a_20646_12358# (2) (split tile, left side) a_20646_12770# (2) a_17404_12358# (2) The four nodes above are the easiest to check for because there are only four of them. When they end up in the connection list with anything else, an error has occurred. a_20646_12488r# (1) (split tile, right side) a_17404_12488r# (1) a_14162_12488r# (1) a_7678_12488r# (1) a_4436_12488r# (1) a_10920_12488r# (1) sky130_fd_io__gpio_ovtv2_odrvr_i2c_fix_leak_fix_0/a_3423_6005r# (1) sky130_fd_io__gpio_ovtv2_odrvr_i2c_fix_leak_fix_0/a_1560_18645# (3) a_20646_12900r# (4) ??? This is net VPB_DRVR. Now I'm very confused. a_17404_12770# (3) a_17404_12900r# (4) a_14162_12358# (3) a_14162_12770# (3) a_14162_12900r# (4) a_10920_12358# (3) a_10920_12770# (3) a_10920_12900r# (4) a_7678_12358# (3) a_7678_12770# (3) a_7678_12900r# (4) a_4436_12358# (3) sky130_fd_io__gpio_ovtv2_octl_dat_i2c_fix_leak_fix_0/w_2900_10961# (4) a_4436_12770# (3) a_4436_12900r# (4) a_20646_12146# (4) a_17404_12146# (4) a_14162_12146# (4) a_10920_12146# (4) a_7678_12146# (4) a_4436_12146# (4) sky130_fd_io__gpio_ovtv2_octl_dat_i2c_fix_leak_fix_0/w_2457_9746# (4) sky130_fd_io__gpio_ovtv2_odrvr_i2c_fix_leak_fix_0/sky130_fd_io__gpio_ovtv2_odrvr_sub_leak_fix_0/VPB_DRVR (4) sky130_fd_io__gpio_ovtv2_odrvr_i2c_fix_leak_fix_0/sky130_fd_io__gpio_ovtv2_odrvr_sub_leak_fix_0/sky130_fd_io__gpio_ovtv2_pudrvr_sub_0/sky130_fd_io__gpio_ovtv2_pudrvr_weak_1_0/VPB_DRVR (4) sky130_fd_io__gpio_ovtv2_odrvr_i2c_fix_leak_fix_0/VPB_DRVR (4) ---------------- Okay, caught it trying to merge net (2) with something: node1 length 1 "a_20646_12900r#" node2 length 35 (has already merged node (2) with something? How can that be?) So run again while looking for any node from (2) appearing in either list. Inconclusive. Do more string comparisons to catch another net? I'm so confused. . . Using net (1) looks better. . . node 1 length 1 "a_4436_12358#" (from net 3) node 2 length 3: ".../a_3423_6005r#" (net 1) "(none)" (who knows?) "a_4436_12488r#" (also net 1) Okay, then, caught magic trying to merge net 1 to net 3, ExtHier.c:693 cum @ (4436, 12488) to (4456, 12508) type mvpmos on left, type mvpdiff on right, SIDE=1 so looking at mvpdiff ha->hierOneTile at (4436, 12508) to (5737, 12588) type mvpdiff (not split) Now check these two tiles. cum is on the leftmost flanged gate but the right side points to the middle node which is (1). hierOneTile is right above this so it is also (1). The mismatch is that "cum" is looking at the right side but picked up a node name for net3. "cum" does not have a region associated with it, so it would have had to pick up this node from the hash table. Run again: Go up one: name1 = "a_4436_12358#". How in heck did it get that name from that tile?? extSubtreeTileToNode(): ttype = mvpdiff. Looking in this triangle for a region in the original layout cell. So for this area, extConnFindFunc() returned the wrong region. Run again but be sure to break at extConnFindFunc(). extConnFindFunc looks at a tile at (4436, 12488) but dinfo has SIDE 0. Why? Okay, apparently DBTestNMInteract() failed. Make sure this is true: rect = (4436, 12488) to (4456 12508), ttype = TT_SIDE tp = (4436, 12488) to (4456, 12508), tpdi = !TT_SIDE Pretty clear failure. Need to figure out why, but also worth considering a quick check in DBTestNMInteract() for the somewhat common case of the same area, same diagonal, which can be analyzed instantly. But for now, break on DBtiles.c:436 if tp is at 4436, 12488. Oh, one has the wrong direction. . . The dinfo record is not guaranteed to have TT_DIRECTION. Uh, but tt1 passed to DBTestNMInteract needs to have TT_DIRECTION, because there's no tile associated with it. It should be, though, because DBSrPaintNMArea gets ttype. Need to check this throughout the code! In most (many? some?) cases, routines are passed tile + dinfo, where dinfo is only guaranteed to contain TT_SIDE. If the routine then calls DBNMPaintPlane, the TT_DIRECTION and TT_DIAGONAL must be set appropriately! This may be wrong in more than one place! (Yes, it was.) Yet, I'm still seeing conflicting nodes. Why? Looking at the .ext file, doesn't seem to have fixed a damn thing. Fixed an error in the diagnostic reporting, should help with the confusion. node1 length 1 "a_4436_12488r#" (1) node2 length 5 "...sub_leak_fix_0/PU_H_N[2]" "(none)" (?!) "...pudrvr_strong_0/PU_H_N[2]" "...fix_leak_fix_0/PU_H_N[2]" "m1_2996_14802#" Leading to this: cum : split tile at (4436, 12770) DIR=0, mvpmos on left, mvpdiff on right. dinfo: SIDE=1. ha->hierOneTile : split tile at (4436, 12770) DIR=0, hierType SIDE=1. But name2 = "(none)". "(none)" is causing the crossover. "(none)" is being caused by the improper type overlays but should not be used to find a net. That helped a lot, but I'm still getting one conflicting node message. node1 length 1 "a_17404_12358#" (2) node2 length 10 "...i2c_fix_leak_fix_0/a_1560_18645#" (3) "a_17404_12770#" (3) Leading to this: cum : split tile at (17404, 12488) to (17424, 12508) DIR=1, mvpmos on left, mvpdiff on right. dinfo: SIDE=0 (looking at mvpmos) ha->hierOneTile : poly at (17262, 12484) to (17404, 12508) name1 seems not right for tile cum. Take a look on the layout: "cum" is not even close to net (2). That might be my own clerical error. Oh, my, I got an LVS match!!!