Search of 17clues phase 2

http://pagesperso-orange.fr/gpenet/SK17/PH2/ph2.htm

Where we are, the reader is supposed to have covered all comments about the phase 1 of the process

 

Quick summary of previous comments

 

The search of 17 clues Sudoku “as of blue” in solution grids can be divided in four sub tasks

 

Produce a pseudo entry file band1+band2 + relevant attached bands3

Do preliminary tasks

           UA generation

           Valid bands expansion

           ….

Find Bands 1+2 of interest

Search band 3 in bands 1+2 having potential

 

In the entry in phase 2,

 

We had produced an entry bands 1+2 plus the relevant set of bands 3 attached,

We have seen a valid XY {(valid band1+valid band 2) ; valid against gangster 1+2} t

This XY has not more than 6 clue known in stack.

All valid GUAs sockets have been worked out,

Some bands 3 included compulsory clues coming out of the valid GUAs have no stack with more then 6 clues.

 

Each band 3 not erased for this XY is an entry for the phase 2.

 

A rough estimate lets us expect 1/10 000 of the XY passing the first filters and half of the attached bands 3 requiring the call to the phase 2. But assuming one billion of XY, this is still a big number, so, even if the code is not as critical as for the first filter, having still have to find 6 clues, the design of the phase 2 is very important.

 

One effect on the preparation tasks is on the generation of sockets UAs. A better response on valid GUA2s—GUA6s will save time in phase 2.

 

 

 Data available at the entry in phase 3

 

We have at this point

 

The XY list of clues

The UAs specific to the gangster in band 3 (the band 3 must be solved against it’s gangster)

All GUAs2 GUAs3 GUAs4 GUAs6 coming out of the sockets valid ( at least one socket UA has not been hit by the XY clues);

 

==================================================================

 

Disjoints UAs

 

Disjoint UAs is here a key factor.

By construction, 2 GUA2s GUA3s in different mini rows are disjoints. Each of them will produce a clue.

If in one mini row we have a GUA2 and a GUA3, the GUA2 is a subset of the GUA3, the GUA3 is ignored.

If in one mini row we have 2 GUA2s, the common cell hit both.

If in a mini row we have 3 GUA2s, then 2 cells are needed to hit both.

 

All this has already been used in phase 1 to set up the minimum count of clues per stack and filter part of the XY.

 

But this does not cover the disjoint UAs potential.

If we consider all cells belonging to the GUA2s GUA3s contributing to the count, we have the area where the compulsory clues out of the GUA2s GUA3s will be (In Field).

 

The band 3 UAs final list (including GUA4s GUA6s) can be split in 2 sub lists

A) UAs having at least one cell “in field”

B) UAs having no cell “in field”

 

If the B sub list is not empty, we have one or more disjoints UAs to consider out of this sub list. In the code, we have no attempt to extract disjoint UAs in this sub list, but the concept is used intensively.

 

Count of missing clues

 

Having split the UAs in 2 sub tables, the process is split depending on the number of missing clues (over the minimum given by GUA2s GUA3s).

 

The process is said “critical” if all clues are “in field”. Then, if B) sub list is not empty, the band can be discarded.

 

We can have from 0 to 6 missing clues.

 

Statistics show a majority of critical situations, and 0-1 missing clues is expected to cover more than 75% of the bands3

 

6 missing clues seems to be exceptional, but exists.

 

The global strategy will be to apply first clues “out field” (not hitting GUA2s GUA3s), then to finish in “sub critical mode”  where the missing clues are taken “in field”.

 

6 missing clues

 

This process is quite classical. All UAs are in the B) sub list. Each set of clues hitting all UAs must be tested as a possible 17.

 

However, the stack limit can be checked first. At any moment, UAs can be limited to “non critical stacks”, and if an empty UA shows up, the process can be stopped for this branch.

 

1 missing clue

 

With one missing clue, we must not have disjoint UAs inside the B) sub list. L UAs are combined in a logical & operation. Compulsory cells “out field” are there. If the & operation leads to an empty list, the band is discarded.

 

The process can end

critical  with one cell out field assigned  

And if the B) list is empty

           sub critical  missing clue taken “in field”

           Critical (missing clue taken out field.

 

Note: the process is called with more missing clues and some “out field” assigned clues. So, the B) list must be first cleaned from UAs hit by previously assigned cells “out field”

 

Global check for a unique solution

 

In the comments given by blue on his process, he explained that the brute force in the phase 2 was applied in “reverse mode”. Try with many clues, and reduce the count till the point where the solution is unique.

 

This is not the way things are done here, but something similar is implemented. In many points in the process, the missing clues have to be taken in a limited list. If the brute force applied using this list as given leads to a multiple solution, the process can be cancelled.

 

The parameters limiting the selection of cells vary in the process,

 

Working “in field” or “out field”

Stacks in critical situation (no clue “out field”)

UAs constraints

Already assigned cells in band 3

 

2-5 missing clues

 

The process applied is very similar in this range of missing clues.

Check if we have still a valid UA in the B list.

If yes take a small one (applying the critical stack constraint)

An empty UA is a dead branch

A valid UA is applied

No  UA forces to try each possible cell

 

And with no UA, we end in “sub critical mode”.

 

Any assignment gives a “stack constraint” adjustment.

 

Critical and sub critical mode

 

Except if we have 6 missing clues, the process always end in

 

critical mode (all remaining clues out of  GUA2s GUA3s) or

Sub critical mode: some redundant clues have to be taken “in field”.

 

In sub critical mode, the strategy is to choose first where to add missing clues, then to continue in critical mode.

 

So the logic is to explain first the critical mode   1 critical process, then to see how redundant clues are added  2 sub critical process

 

Then, we can see the process applied to add clues outfield. The case with one missing clue is special, the process for more missing clues is very similar, each step adding one clue end ending in the process for one less clue. See here