Having abandoned the attempt to redefine a split tile as two separate tile entries, I am returning to the problem of removing TT_SIDE from the database. TT_SIDE has been used to pass information to callback routines to indicate which side of a tile should be processed. This should never have been done, because it causes the database to be altered during searches, which prevents searches from being parallelized. To remove this dependency: All the basic search functions will require an additional argument which is a boolean and indicates which side of a split tile is to be processed by the callback. It is probably fine to treat the argument like "dinfo", which is to make it a TileType and set TT_DIAGONAL, TT_SIDE, and TT_DIRECTION as needed. For the basic use of callbacks, it will generally suffice to set only TT_SIDE. ----------- Both TT_SIDE and SplitSide() are used frequently. Which is why I have not done this before. First, enumerate all callback functions which use each of the above, and the search routine that will require the extra argument. Search functions or functions needing changes: (Followed by the number of occurrences) DBSrPaintArea() 401 DBTreeSrTiles() 99 DBSrPaintNMArea() 16 DBSrPaintClient() 15 SimTreeSrTiles() SelEnumPaint() GrClipTriangle() GrDrawTriangleEdge() GrDiagonal() DBTransformDiagonal() GrBox() SplitSide(): calmaMergePaintFunc() --- Comment, unused, remove. calmaWritePaintFunc() calmaMergePaintFuncZ() --- Comment, unused, remove. calmaWritePaintFuncZ() cifHierCopyFunc() cifHierErrorFunc() cmdDropPaintCell() To be fixed: STACKPUSH/STACKPOP stuff in resis/ResUtils.c resAddField will need to do something similar to extract to handle two clientData records for split tiles in ResAddPlumbing and ResRemovePlumbing. Some changes to stack handling were pretty hastily done and all should be checked for consistency. dbcUnconnectFunc() --- Need to handle side, or is this already handled? extConnFindFunc() --- Select proper region depending on side extInterSubtreeTile() --- Should handle non-Manhattan tiles. lefConnectFunc() --- write polygons to LEF? resExpandDevFunc() --- Probably needs to handle split types and replace STACKPUSH. (Lots to do in ResUtils.c) touchingTypesFunc() --- Should distinguish types between sides, and handle triangle geometry. cifInteractingRegions() --- Uses the cifSquares code, but cifSquares does not handle non-Manhattan geometry, and interacting/overlapping methods must. antennaAccumFunc() --- Needs to handle non-Manhattan geometry. extTransPerimFunc() --- Needs to set TT_SIDE based on boundary direction before calling DBSrConnectOnePlane (ExtBasic.c:3882) ResMultiPlaneFunc() --- Needs to call ResNewSDDevice() with dinfo information Any routine that does not use "dinfo" should do "if (dinfo & TT_SIDE) return 0". No, that's not needed. Ought to create a single function for simply returning "1" and replace all the assorted functions that do that individually. --------------------- First pass is done, minus issues reported above that need to be worked on. The basic issue of adding an extra argument to all the callback functions is basically completed. Moving on to compiling. . . CIFgen.c has lots of warnings around PUSHTILE(). Probably an incorrect cast, but I need to check all of the uses of PUSH and POP everywhere, anyway. (done) 1/2/2026: Compiles now! Debug. . . Errors in dbwind (missed a DBSrPaintArea() call) Need to pass dinfo to GrBox() and GrBoxOutline() Reminder: Need to check for all instances of SplitSide(), as there should not be any. There are still 38 references that need to be removed. (done) Fix: extWalkTop (Bottom, Right, Left). . . type should not have been set by the split side of tile "tp". Should be according to the side being searched and the diagonal directions of "tile" and "tp". (done) ResUtils: Cannot use STACKPUSH and STACKPOP A number of remaining uses of SplitSide() are in routines that were not fixed for dinfo or are not directly called from modified search functions, so will need to fix each one in turn. (done) Add dinfo to extSubtreeTileToNode() and extSubtreeHardNode() extNodeToTile needs to return dinfo. . . (done) extGetRegion() will need to be handled but only when I'm done with this and working on being able to attach two regions to a split tile. =============== To do: Fix the several places where the compiler spits out warnings. (1) ExtHier.c:43 use of TileType in extNodeToTile (extractInt.h:1060) (done) (2) CIFgen.c:1237, 1339: Issues with using PUSHTILE and STACKPUSH, should not cast dinfo to type ClientData; use INT2CD(dinfo). Also didn't like (TileType)STACKPOP. . . Use (TileType)CD2INT(STACKPOP(...)) (done) Okay, those are fixed. =============== Large-scale tests: (1) Ran on gf180mcu_ocd_sram_test; Running full DRC. (passed) (not exhaustive!) Also need to test: GDS output (passed) (eh, not exactly) (okay, good now) GDS input (passed) extraction (oops, segfault) (ExtHier.c:518) extresist net selection antenna checks LEF read LEF write DEF read DEF write Especially need to check the sky130 I/O where there are split tiles with both sides active. Still need to resolve the issue with attaching two net regions to a single split tile, and to return the correct region entry. Oops, reading gf180mcu_ocd_sram_top.gds.gz back after writing, then letting DRC run, crashed at some point. Probably in DRC, but unsure. Will try to repeat. Yes, missed a routine drcSubCopyErrors(). Because it's run from DBNoTreeSrTiles(). Another issue---GDS input failed to read in some non-manhattan tiles; use klayout to make sure that GDS output was correct. (yes) GDS may have just been corrected after an error was found; check GDS read again (nope). Doesn't happen with metal1, for example, only with psd (in the GF tech). Issue is that PPLUS triangles are inverted in a number of places, so output generation was messed up somewhere. Tested and found "shrink" to be the cause. DBDiagonalProc() was the cause. Its setting of TT_SIDE at the end was non-functional (DBUndo does not use it, as claimed), and the bit was getting set in the tile and disrupting any code using "TiGetTypeExact(tile) | dinfo". Maybe there should be a guard against anything setting the TT_SIDE bit in a Tile's ti_body field? Re-running run_gen_gds.sh on the SRAM test chip to see if that fixed the I/O corner cell. Well. . . Almost. cifoutput hierarchical checks generated extra non-manhattan geometry on the top level which is not *wrong* but shouldn't be there and was not there before the code changes. Found a place where "dinfo" was not handled; fixed, and retrying. . . Good. ------------------------- Next major error: PUSHTILE caused a "corrupted double-linked list (not small)" error, from ExtNghbors.c:197. Modifications to the code that shouldn't have changed anything seem to have made this pass, although can't tell yet if it works correctly. Getting another error "free(): invalid next size (fast)" error on ExtFreeLabRegions() now. . . Maybe best debugged with valgrind. . . Looks like it comes down to "extSubtreeHardNode()" being passed a split tile. Before that, extSubtreeTileToNode() on a split tile. From extHierConnectFunc2(), ha->hierOneTile is split. Split dir = 1, right type = 117 (metal 3), left type = space Typical case. . . ha->hierType looks correct. TT_SIDE set, looking at right side, which is metal3. Down in extSubtreeHardNode(), ttype = 117 (okay) extSubtreeHardUseFunc called on POWER_RAIL_COR_1_0. Then it calls ExtFindRegions on POWER_RAIL_COR_1, which created the label region, and then calls ExtLabelRegions, where tile tp is a simple metal 3 tile at (46068, 14000), "reg" is its ti_client, but reg had been freed. Therefore, ExtFindNeighbors() at extract/ExtHard.c:509 did not search all of the tiles that were originally tagged by ExtFindRegions() at extract/ExtHard.c:207. Maybe this is due to how reg->treg_tile and reg->treg_type are set? ExtFindRegions calls DBSrPaintClient() with callback extRegionAreaFunc() extRegionAreaFunc() calls ExtFindNeighbors(). ExtFindNeighbors() uses macros like PUSHTILERIGHT, etc., which could always be wrong, as could the use use of dinfo when deciding what to check and what to skip. Double-check everything in this routine. (looks okay) The most problematic case is if a region's tile is set to a split tile. This will eventually not be a problem when the regions are handled between splits. But for now, it might cause serious problems. Try breaking when this happens and see if that might be related (as it, it only happens right before magic crashes). Or. . . just rewrite some of this so that magic doesn't try to move the region's tile off of the split tile? (There was one instance of this. Changed it and re-running). Presumably was a good thing to do, but didn't change anything regarding the crash condition. Hm. But also: ExtBasic.c:4547 is doing the same thing. Changed that, still no luck. Grrr. Might be the missing treg_type in ExtLabFirst, which would need to be added; the routine does not depend on the tile type, but would still require the dinfo to be saved. And. . . Still no luck. *sob* "sublist" is related to sticky labels and is not being freed under some circumstances. I don't think this is related, but should be fixed. Huh. Maybe try a careful check between the original and new versions of each of these files? extract/ExtRegion.c extract/ExtNghbors.c extract/ExtHard.c extract/ExtSubtree.c extract/ExtHier.c If all else fails, create a routine to dump a list of tiles being set to regions, and tiles being cleared of regions, to be activated on POWER_RAIL_COR_0, so that the complete list of tiles being visted to set regions and tiles being visited to clear regions can be compared directly. Actually, this is probably more productive than looking at the file differences. Run again so that at the point of failure, can move up the call stack to find a routine where a node or def can be checked for turning the diagnostic on or off. When valgrind catches the use of freed memory, ExtLabelRegions def->cd_name is "POWER_RAIL_COR_1", although the last printed statement said that POWER_RAIL_COR_0 was being extracted; I think it's because this is a use. The routine common to both the error and the place where the memory was freed is extSubtreeHardUseFunc(). So: 1) At extSubtreeHardUseFunc, if use->cu_id is "POWER_RAIL_COR_1_0", enable the diagnostic 2) With the diagnostic enabled, list every region and every tile encountered by ExtFindNeighbors() that is connected to that region. Or maybe can get more targeted than that? Oh, no, . . . when I output diagnostics, the error doesn't occur. . . So how does LVS validation do? The diagnostic output is long. May need to redo it as two files, so that the "set" and "reset" lists can be checked side by side for any discrepancies. But if no error occurs when diagnostic output is enabled, then how can I catch the error? Well, divergent behavior *did* show up. At line 22266 of the output: It appears that ExtFindNeighbors() called from extHardFreeAll() stopped after the first tile. Tile @(46528 17492) type 0x50000075 Confirmed that this is (1) Not the first time that a split tile is the first encountered, BUT (2) This is the first time that a split tile is encountered with the active tile type on the left. Need to redo this and print the diagonal information. Not really necessary, though. Can stop printing diagnostics now and concentrate on finding what happens when ExtFindNeighbors encounters the tile at (46528, 17492) immediately after being called from extRegionAreaFunc or from FreeAll. Now can break on ExtNghbors.c:138 and 142 when tile->ti_ll.p_x == 46528 && tile->ti_ll.p_y == 17492 and see what's going on. Dinfo is 0x40000000 = TT_DIAGONAL. topside skips. leftside is run, pushes m3 (not split) tile at 42868, 17492. Dinfo is 0x70000075 --> TT_SIDE has been set here, but should not have been. Moving up, treg_type has been set to 0x70000075 for this region. Note that treg_ll is not the location of treg_tile (treg_ll = 14000, 42497). Need to find when treg_type was set inappropriately. Note that extractInt.h says that "treg_type" is the type of treg_tile, which was changed from "type of tile that contains treg_ll", which may be an indicator of the issue. . . Watch where treg_type is set in ExtBasic.c and ExtHard.c. . . We've got: ExtHard.c:91 ExtBasic.c:4610 (not relevant) Setting reg->treg_type to dinfo is missing from extTransFirst. Still not clear what's going on. The "labRegList" generated by ExtFindRegions() should be the same one as originally added in extLabFirst(). So rerun (again), break as above on ExtHard.c:91, and track when the reg->treg_type changed. Looks like "extSetNodeNum" is the culprit. The type is changed to the new tile representing the lower left-hand corner and plane. It is not immediately clear if not saving dinfo with the the lreg_ll and lreg_pnum information will cause problems, but that information should be recoverable in other ways (i.e., if the tile at point lreg_ll on lreg_pnum is split and the type at lreg_ll is not lreg_type, then the side must be changed). $$#!@ still caused a segfault. Okay, I screwed something up badly. The two processes now diverge on the very first call. Now fails at tile (42868, 14000) I do see an error, so try again. . . Ah, some light at the end of the tunnel! Maybe joy! Looks right. Removing diagnostics from the code and re-running with valgrind. . . . And now it seems to have gone into an infinite loop. But I forgot to do a "make install" and may have just caused a massive issue. Did the install and re-ran. POWER_RAIL_COR does take a long time to run, so each parent cell will be worse, so it's likely just an issue with this cell, extraction, and valgrind. Let it run to completion, and then later compare to running without valgrind. Make sure that in both cases, the final netlist result passes LVS. (Conclusion: Yes, it finished running under valgrind, eventually, and valgrind did not have any more issues. But that was a long running process and I will try to do that as little as possible.) Side note: This example has another interesting feature which is that halfway through, it goes back to the prompt, which is a known issue with "extract" but it hasn't been obvious how it happens. It should be possible to Ctrl-C out of a long-running extraction but it should *not* be possible to run commands while the extraction is ongoing. It appears to happen because only part of the design is loaded when extraction starts. When magic goes to load the rest of the design, it returns to the prompt. --------------- NOTE to self: The main thing now needing handling for extraction is to have extGetRegion(tp, dinfo) and call this appropriately everywhere. Where tp is part of a boundary record, it should be possible to derive dinfo. --------------- For now, from January 5: Back to general checks: Repeating the list from above: Need to test: GDS output (passed) GDS input (passed) extraction (pending) (fixed cap coupling issues) net selection (seems okay) antenna checks (okay) LEF read (okay) LEF write (okay) DEF read* DEF write* extresist (* knowing that there are some errors with DEF read/write that are unrelated) "def write" appears to have taken an excessively large amount of memory. This is probably not related to recent code changes but should be investigated. While the design tested is large, it is not large to the tune of 32GB+, which seems to be taken up entirely by defblockageVisit. This is unreasonable and must be fixed. Tested extraction on sky130_fd_io__top_gpiov2_flat, which crashed immediately; however, it is known that it has split tiles with different nodes and will require split nodes to be handled properly. Make sure that's the issue, though. No, actually it's extAddOverlap needing an extra argument. ---------------------------------- Running some of the I/O torture test from sky130 in ~/projects/efabless/sky130_fd_io/. These tests are important because they are part of the reason for fixing the nonmanhattan code, since the requirement of setting two regions per tile is needed for several cells in this I/O set. In lvs_tests/ First pass, running "run_top_sio.sh" resulted in magic hanging in DRCFindInteractions() while extracting "sky130_fd_io__sio_ipath_com". Here, drcSubcellFunc() is getting called alternately on uses sky130_fd_io__sio_com_m2m3_strap_5 and sky130_fd_io__sio_com_m2m3_strap_6. Given the recent work aroudn DRCFindInteractions() there is a good chance this has nothing to do with split tiles. (Confirmed) Uh oh. subUse = sky130_fd_io__sio_com_m2m3_strap_5 subUse->cu_bbox = 627, 3428 to 1485, 214751792 which sounds bogus. subUse = sky130_fd_io__sio_com_m2m3_strap_6 subUse->cu_bbox = 1615, 3248 to 2473, 214641792 which is equally bogus. But this appears to derive from the .mag files in the library. Bogus entry is in the sky130_fd_io__sio_com_m2m3_strap.mag file: "rect 164 319 214748364 321" on layer "comment". Remove this entry and correct the "box" entries in "sky130_fd_io__sio_ipath_com" to "0 0 364 858". This requires a separate investigation. I have not compiled the sky130 PDK for a while. There are some arrows drawn with comment that appear to have been mangled on GDS input. They should probably be removed from the database. However, this suggests an issue with GDS read-in. There are multiple "strap" layouts, all of which have this issue. Need to recheck the GDS read-in. Maybe just rebuild the sky130 PDK (using the previous version of magic)? That seems to have corrected the issue, which might have been caused by building the PDK with a bad version of magic. Doing "run_top_sio.sh" works now, although with the same errors as it had historically (waiting for proper handling of regions on split tiles). The "run_top_sio.sh" script now runs with surprisingly few issues. Three metal1 resistors are missing and the grounds are not cleanly separated, and very little else. "top_pwrdetv2" had been a problem but now succeeds, which is pretty significant. --------------- Split region handling: Still need to fix boundary checks: extTransPerimFunc(), extSideLeft(), etc., etc. Look for "(TileType)0" for places that need fixing. Boundaries: Should define directions for non-Manhattan tiles. b_inside = b_outside, b_segment follows the diagonal. Replace extUnInit with CLIENTDEFAULT and remove extUnInit as a global variable, as that is ridiculous. (Done, along with associated stupidity extNbrUn and also passing the value to ExtFindRegions().) Need to understand these functions better. . . For example, ignoring the coupling cap stuff for now, extOutputDevices() scans transList, sets tr_perim = 0 calls ExtFindNeighbors() from the region's tile (a tile belonging to the device) arg.fra_each = extTransTileFunc. Initial perimeter is 0. For each tile called back by ExtFindNeighbors, call extEnumTilePerim() with function extTransPerimFunc(). Anything currently with (TileType)0 or which calls simply TiGetType() needs fixing: extSideCommon(): Pass boundary and use extGetBoundaryTypes() Okay, but this still does not account for everything that needs to be done when checking coupling between non-Manhattan edges. But it should keep things from crashing or producing stupid results. Oh, no. fra_uninit is being used to process ExtFindNeighbors with a specific node like the transistor gate being considered "uninitialized". ExtNghbors.c:247 --- Need to handle separately; move "continue" down into each of the conditionals (done) ExtNghbors.c:137, 187 --- Set dinfo appropriately for top and bottom sides. (done) (may complete the handling of ExtFindNeighbors() and also properly eliminate extNbrUn as a global variable.) Whew. Is that all? (Almost certainly not.) Yes, missed code at ExtBasic.c:5203 and below. (fixed) Okay, it compiles again! Time to test again!