Search of 17 clues puzzles in a solution grid


O End of part 1 process


All the preparation has been done to have the critical code as simple as possible, so here, the code is relatively short but going first trough that process can ease understanding of the preparation tasks.


O1 chunk process


The reader must keep in mind that the innermost loop in the chunk process will be called billions times for many entries. Each processor cycle saved has a true value.


The second key point is that most of the XY will be killed in the innermost loop (y loop in the chunk process)  . Using a bigger UA tables, I think that more than 95% of the XY don’t pass the first filter.


The third key point is that in most cases the lowest 64 UAs  will have at least one UA not hit, killing the XY 


And we keep in mind also that the chunk organization is done mainly to save cycles through the cache.


To optimize the cache, the chunk is first scanned to find XY passing the first filter. Such XY are “stored” in a relay buffer and the next steps are called in a second phase. 2 different ways to achieve that are proposed here.


O 1A Chunk Tested process


In the tested process, a chunk is made of 256’x’ 256’y’ (except for the last ‘x’ or last ‘y’).

256 bit fields of 256 bits each are used to mark XY passing the first filter

At the end of the chunk loop, the 256x256 bits are scanned to call the next steps


We have usually in the 2X2Y UAs table a relatively small number of UAs (<128) but the number of UAs will increase when small UAs are added coming out of the bands 1+2 brute force check.


To save cycles in the critical code, one loop has been coded for each number of  64 bits pieces in the vectors

Here is the code for 65-128 UAs in the 2X2Y table


              case 2:{// likely the highest frequency

                     for (int ix = 0; ix < nxc; ix++, px = &px[2], tbits = &tbits[4]){

                            register uint64_t *py = vyc;

                            for (iy = 0; iy < nyc; iy++, py = &py[2]){

                                   if (px[0] & py[0]) continue;

                                   if (px[1] & py[1]) continue;

                                   {// passing UAs filters flag it in       chunkbuf[256][4];

                                          register uint64_t B = 1; B <<= iy & 0x3f;// 1<<(iy%64)

                                          tbits[iy >> 6] |= B;







We see the double loop ix;iy 

The v[X] and v[Y] have  2 times 64 bits, so we have 2 tests

And the bit is set to ‘1’ if the XY hit all UAs


So far, a maximum of 8x64 bits  for the 2X2Y table seems reasonable, 8 such loops are coded with increasing numbers of lines if...


Test have been done with a maximum if 256 UAs


O 1b Chunk Alternative process


Here the chunk is tailored to have x,y relative index within 16 bits. Xsize is 64 and Ysize 1024.

A buffer of 64536 words receives the XY passing the first filter. Usually, the buffer will end nearly empty, but to be safe, enough memory has been reserved for the unlikely case where all XY would pass the first filter.


The filters are processed in the same 8 possible cases as in the first option.(1-64 UAs to  458-512 UAs for a given 2X2Y) The ‘X’ loop is here outside the switch option, but the inner loop is 4 times as big as in the first option. If the test show a penalty, the ’X’ loop can be move in the switch area.


              register int iy, store, ix, ipx = 0, Rvua = n64vua;

              register uint64_t *px = vxc;

              for (ix = 0; ix < nxc; ix++, px = &px[Rvua]){

                     store = ix << 10; //10 low bits for iy

                     register uint64_t *py = vyc;

                     switch (Rvua){

                     case 1:{

                            for (iy = 0; iy < nyc; iy++, py++){

                                   if (px[0] & py[0]) continue;

                                   *pstore++ = store | iy;// (ix<<10) | iy








O 2 XY process


For each XY passing the first filter (UAs table fully hit) the next step is prepared, collecting

The ‘X and the ‘Y XY_EXPAND situation

The vectors for the GUAs status.


The XY status per stack is worked out and checked, but a very small number of XY will be cancelled at that level.


UAs found in the brute force of previous XY in the 2X2Y pass are checked (must be hit as well)


Still valid sockets are searched and the table of bands 3 is scanned to verify that at least one band must be sent to “part 2”


And if at the end the XY could not be cancelled, the brute force is applied to look for uniqueness in bands 1+2.


If the Band is valid, then the “part 2” is called for all bands 3 still active.



O 2.1 checking the “more” UAs


The “more” UAs appear when a XY pass all filters to reach the last step, the brute force check for uniqueness.


If the check fails, a UA can be extracted from the first false solution. This UA is stored in one of 2 FIFO tables limited to 48 UAs each  (#define G17MORESIZE 48)


Such UAs can be used as filter for the next XY of the 2X2Y pass ( the smallest are added to the start table of UAs at the end of the 2X2Y pass.


UAs are split in 2 tables for practical reasons : smaller UAs have a better chance to have some effect later. Biggest UAs have chances to play a role in the near vicinity of the XY where they have been produced.


The way to load tables and to apply filters is the same for each table; this was the reason to use the

struct G17MORE to implement the code.


The filter is applied starting from the last UA loaded to end with the oldest one. The table of small UAs is checked first (the first UA not hit stops the process)


O 2.2 Extracting valid sockets


We defined GUAs for the gangster band 3, Before any attempt to look at the status per band 3, we have to find all sockets still valid.


Note: as said earlier, the process should consider sockets that can not be hit in the 2X2Y pass, what is not done so far. This is just a matter of performance.


The process has been carefully prepared at the 2X2Y level, so the code is relatively simple


The final vector GUAs is built in the vw area

All bits in the “results areas” are set to 0


The vector vw is scanned by pass of 64 bits.

If a bit ‘1’ is seen :

           The corresponding socket is set to ‘1’ in the results

           All GUAs of he same socket are ignored using the “socket vector”

             applied to the current 64 bits and to the next 64 bits

             (a dummy field has been added in vw to avoid the test of the last bloc)

At the end of the check, we have 2 bit fields, one for pairs and one for triplets having the same structure as the bit fields prepared in the band 3 analysis.


O 2.3 Build Active Bands3.


An XY having no band3 passing easy filters for band 3 can be ignored. This filter is less expensive than a brute force check and can play a key role in the performance when the number of bands 3 is small (usually a huge number of XY passing the upstream filters).


The process here prepares the band 3  “part 2” (so called in blue’s process.) All active GUA2s  of the band 3 studied are stored as bit field and as complementary bit in the mini row.


Triplets in a mini row (GUA3s) are accepted as one clue if no pair has been seen in the same mini row


Stacks receive the minimal count 

1 per active mini row

2 for the a mini row having 3 GUA2s active.


At the end, a band is cancelled if the minimum count exceed 6, the number of clues expected in band 3. If the count is 6, the number of clues expected in band 3, the band is called in “critical situation”. All clues must be taken out of the cells belonging to GUA2s GUA3s.


The data to be used later for the valid bands are stored in a struct G17TB3GO for later processing.



O 3. Brute force call CheckValidB12()


Here we just have a call to the brute force and we receive back the first false solution in UA mode if the XY is not a valid band 1+2. The UA is stored in one of the 2 FIFO tables depending on the number of cells of the UA.


The special brute force applied will be seen later in the chapter dedicated to the brute force. In this case, where the “good” solution is known, the guess sequence select in priority the “good” candidate. As the process is halted at the first “false” solution, we have a good chance at the end to get a relatively small UA.


Note : Normally several XY with the same ‘X’ are expected in sequence (more if the chunk is 64x-1024y), so the brute force initial ‘X’ process can be saved. This gives a small potential for improvement.


O 4. Preparing the “part 2” processing of still valid bands. G17XY::Go_ValidB12()


This is in fact the preliminary task of “part 2” the processing of  the valid bands 3.


A small common task is done at the start, the preparation of the final brute force for the bands 1+2 part of the final solution to test.


The rest is the outer most loop if the “part 2”. The loop is done on the table created in



At the top of the loop, the brute force receive the band 3 solution to use for the final check.


O.4.1 General strategy for “part 2”


In the general overview, the general strategy for the band 3 processing has been shortly described,.


I called

“field 1” the set of cells belonging to the GUA2s GUA3s

“field 2” the rest of the cells in band 3.


The general strategy will be to assign cells in “field 1” at the end.


The situation in band 3 is qualified of “critical” when the minimum count of cells to assign in GUA2s GUA3s  is equal to the number missing to have 6 clues in band 3. In the test made on a big sample, about 45% of the bands 3 processed where in a “critical” state at the start of “part 2”.


Out of GUA2s GUA3s, the selection of cells to assign will use the set of UAs of the band 3 plus GUA4s GUA6s.


These UAs will be split in 2 tables


“T1” for UAs having at least one cell belonging to the GUA2s GUA3s

“T2” for Other UAs.


If the status of the band is “critical” at the start, T2 must be empty, all the clues will be in “field 1”.


The first task done here is to extract GUA4s GUA6s and to build T1 and T2.

As the process for band3 is widely in recursive mode, T1 and T2 are located in the XY calling structure. The size of the memory locked has been set to have room for shrunk tables after clues assignments.  If this is done, the shrunk T1/T2 is put at the bottom of the parent table.


This seems to much work to be done before the control that the XY is a valid band.


Note: After the tests on the fist draft of the band 3 process, it appeared better (explained later) to go to the brute force immediately when the situation is “critical” at the start of “part 2” (~ 45% of the bands). Doing that call before the UAs extraction and split  should give a better solution. As this easy to implement, it will be in a good position in the “to do” list


O 4.2 GUA4s GUA6s new UAs for a band 3


In fact, as said in the overview, once a XY is established as valid band 1+2, al clues added will be in band 3. So all GUAs are reduced to the band 3 part of the GUAs.


GUA2s and GUA3s are new UAs for the band 3, but as they are “quasi native disjoints “UAs”  they are processed in a special way.


GUA4s GUA6s, UAs having a pair socket with the 2 cells in different mini rows are just added to the known minimum set of UAs specific to each band 3 (morphed at the very beginning out of the constant table of all bands UAs)


As GUA4s GUA6s have a small number of cells, they are put at the top of the final band 3 UAs in T1 and T2.


During the analysis of the bands 3 in the preliminary task, a bitmap of the GUA4s GUA6s per socket was created and an index pointing on the corresponding UA was built, so  extraction of GUA4s GUA6s is relatively simple


A) “and” the pair sockets bitmap  with the bitmap of GUA4s GUA6s for the band

B) Extract valid bits in the resulting bitmap and use the index to find the corresponding GUA4/GUA6


O 4.3  Recursive process for band 3


Up to now, the process was perfectly controlled in loops and for some fix areas could be used to process

a 2X2Y step;

a chunk step;

an XY passing the first filter.


The band 3 process is a complex mix of 

Assignments based on a UA cells

Loops finding “single cells to assign”

Chains of steps in “non critical” and “critical areas”


The process will be explained in the next chapter. As in the brute force, it is done in recursive mode with the structure G17HANDLER  as host of the recursive. data.


The start  G17HANDLER is defined in G17XY, The same G17HANDLER is used as long as no “trial assignment” is done. When such a “trial” is done, a new  G17HANDLER  is created in the program stack to continue the process. The old handler is first copied in the new handler.


Note: in the brute force, the last “trial” for a group of values is done in the existing handler. This saves time and resources. This is not yet implemented in the current code


O 5 Warning


The code released has small changes compared to the code tested. And a lot of cleaning took place to have a version easier to read and to validate. Regressive tests must be done. I committed myself to deliver all the code by the 24th of September, so the corresponding tests will start after the release containing a full draft of the program.