WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
10-23-2014, 05:20 PM
Post: #21
 patrice Member Posts: 184 Joined: Dec 2013
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
8 Queens on steroids
Breaking news: a little teak dropped the runtime to 1.65s (runs 10 times in 165 ticks) which is about 4 times faster than the non bitmap program for the wp34s.

Code:
148     [f] LBL D        149     [f] BASE 16      150     WSIZE 32         151     [h] CLx  152     STO C           loop counter 153     STO I           Stack pointer 154     1        155     0        156     1        157     0        158     1        159     0        160     1        161     STO A           0x1010101 162     8        163     STO B           remaining Q 164     [h] MASKR 08     165     [h] SL 01        166     STO 00          0x1FE                  167     [f] LBL 02      x=bitmap y=Q position 168     INC C    169     [h] BS?->Y      test Q position 170     [h] GTO 05      Q OK, next line                  171     [f] LBL 03      x=bitmap y=Q position 172     [h] DSZ Y       SHIFT QUEEN 173     [h] GTO 02      RETRY 174     INC B           otherwise backtrack 175     RCL->I          pop a Q 176     DEC I           pop a Q 177     RCL->I          pop a Q 178     DEC I           pop a Q 179     [h] GTO 03                        180     [f] LBL 05      x=bitmap y=Q position 181     INC I    182     STO->I          Save bitmap 183     X<>Y     184     INC I    185     STO->I          save Q 186     X<>Y     187     DEC B 188     RCL A           get Q mask                                 189     [h] RL->Z       shift                                      190     [h] OR          merge with bitmap                          191     RCL 00                                                     192     [h] NOT                                                    193     [h] AND         clear combination                          194     [h] RR 09                                                  195     [h] RRC 08      shift diagonal                             196     RL 16                                                      197     [h] RLC 08      shift diagonal                             198     [h] RR 07                                                  199     ENTER                                                      200     ENTER                                                      201     [h] RL 08                                                  202     [h] OR          ORing L V R                                203     ENTER                                                      204     [h] RL 16                                                  205     [h] OR          ORing L V R                                206     RCL 00                                                     207     [h] AND         mask Combination                           208     RCL L                                                      209     [h] XOR         negate Combination                         210     [f] X=0?        if 0                                       211     [h] GTO 06      deadend                                    212     [h] OR          merge Combination to bitmap                213     8                                                          214     X<>Y                                                       215     [h] GTO 02                                                                                                                 216     [f] LBL 06      Shortcut: all queens locked on the line    217     RCL B                                                      218     X=0?            if B = 0                                   219     GTO 04          finished                                   220     8                                                          221     STO+C                                                      222     INC B           pop a queen                                223     RCL->I                                                     224     DEC I                                                      225     RCL->I                                                     226     DEC I                                                      227     [h] GTO 03                                                                                                                 228     [f] LBL 04                                                 229     RCL C                                                      230     BASE 10                                                    231    [g] RTN
Thanks IceMan for doing the timing
PS: there is still a few drops to squeeze by switching from LBL/GTO to BACK/SKIP.

Patrice
“Everything should be made as simple as possible, but no simpler.” Albert Einstein
10-23-2014, 10:11 PM
Post: #22
 Claudio L. Senior Member Posts: 1,830 Joined: Dec 2013
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-22-2014 01:10 PM)Claudio L. Wrote:  30% seems a lot for anybody. When I watched your presentation, I understood that 30% was also including all your other optimizations. But each model will have a different impact. I'm with you, if the difference is 30%, it's not minor by any means. But on other models I don't think that it will be a big impact. If I remember correctly for the C/Regvars, the change in speed was less than 10%, and that included disabling all compiler optimizations, not just that indirection.

Quoting myself for reference here.
I did a few tweaks and created 50g RPL versions of the problem with the following conditions:

1) Original code
2) Stack code, using the values in the stack rather than a list
3) Original code with double indirection removed
4) Stack code, with double indirection removed
5) Bitmap code

I'll download and post the code for all cases over the weekend.
Right now let's go straight to the point. These results are on a 50g at normal speed:

1) Original code = 90 seconds
2) Stack code = 60.6 seconds
3) Original w/no indirection = 67.6 seconds
4) Stack code w/no indirection = 57.7 seconds
5) Bitmap code = 54.3 seconds

A few observations:
3 vs. 1: When using (slow) lists, removing that indirection has a big impact. The advantage comes not much from the read indirection, but mainly from not writing the value back to a list (which forces creation of an entire list every time) until the end of the inner loop.
2 vs 1: As I mentioned before: lists are slow. Using the stack provides a boost.
2 vs 3: When you remove the list creation from the inner loop, the effect of the lists is not as bad, so the difference is smaller.
4 vs 2: With lists out of the picture, the penalty for the indirection within the stack is minor (as it should be).
5 vs 2, or 5 vs 4: The Saturn CPU isn't particularly good at bit manipulations, so this technique is not such a massive improvement over the normal solution. Still edges faster.

Conclusions:
a) The effect of the indirection is large when using lists (PUT much more than GET).
b) The effect of the indirection removal is small when using the stack
c) The bitmap code is still faster than the normal approach, but not by much in this model.

Claudio
10-24-2014, 11:32 AM
Post: #23
 patrice Member Posts: 184 Joined: Dec 2013
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution

To Xerxes: After studying a few programs from your benchmark, I see that the programs fall into 3 categories:
1) programs that can find all the solutions AND quit nicely.
2) programs that can find all the solutions but fail after the last solution.
3) programs that can find only solutions with the first queen at position 8.

My bitmap programs fall into category 2.
The HP-15C program fall into category 3 which means the programmer used the knowledge that the first queen is 8 and don't need to be moved to find the first solution.

Obviously, a category 3 program will be simpler and faster.
Is it in the spirit of the benchmark ?

Patrice
“Everything should be made as simple as possible, but no simpler.” Albert Einstein
10-24-2014, 01:00 PM (This post was last modified: 10-24-2014 05:33 PM by xerxes.)
Post: #24
 xerxes Member Posts: 116 Joined: Jun 2014
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-24-2014 11:32 AM)patrice Wrote:  To Xerxes: After studying a few programs from your benchmark, I see that the programs fall into 3 categories:
1) programs that can find all the solutions AND quit nicely.
2) programs that can find all the solutions but fail after the last solution.
3) programs that can find only solutions with the first queen at position 8.

The original program was designed to find the first solution only. Later I was forced to make a structured version,
due to the lack of GOTO in some languages, but it's the same way to the solution. The fact, that the unstructured version
is also able to find all solutions, is a side effect and was not intended.

I've tried to comprehend your observation about category 3 using the FX-603P, but I'm not able to see what you mean exactly.
The algorithm starts with putting the first queen at position 8, without the knowledge that it's not necessary to move
it later for the first solution only. Even if I put the queen at position 8 and start calculating at the second queen, it
doesn't change the execution time significantly.

I've also tried to start the calculation with the first queen in other positions without failing, but of course different execution times.

Calculator Benchmark
10-26-2014, 01:55 AM
Post: #25
 Claudio L. Senior Member Posts: 1,830 Joined: Dec 2013
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-23-2014 10:11 PM)Claudio L. Wrote:  These results are on a 50g at normal speed:

1) Original code = 90 seconds
2) Stack code = 60.6 seconds
3) Original w/no indirection = 67.6 seconds
4) Stack code w/no indirection = 57.7 seconds
5) Bitmap code = 54.3 seconds

6) New in-stack recursive algorithm = < 30 seconds

Still working on cleaning up the sources for publication. I messed up and inadvertently overwrote a subroutine of my bitmap solution (was trying to improve it and STOred in the wrong name), so I have to see if I can write it back from scratch now.
But then I thought of starting an algorithm from scratch that uses the stack. The new algorithm uses the fact that queens can't share the same column, so it removes it from the possible locations of the next queens, being significantly faster than the bitmap solution.

The idea is: for the first queen, you have 8 possible locations (1 thru 8), which are numbers left in the stack. The first queen takes a number, leaving the other 7 positions in the stack, and recursively calls for the next queen to do the same. Of course, each queen takes a number from the list and verifies that doesn't conflict with any diagonals of previous queens (only the diagonals need to be checked since the column is guaranteed not to be occupied by another queen). If there's no conflict with the diagonals, recurses into the next queen, passing the remainder positions in the stack. If there is a conflict, it returns that position to the list (to the top) and pulls the next one.
For example, the second queen receives 1 2 3 4 5 6 7 (after the first queen takes the 8). Pulls a 7, checks that it's in the diagonal of the previous queen (in the 8), so it returns the 7 to the top of the list, leaving now 7 1 2 3 4 5 6, and now pulls the 6 and does the same. This list of available positions keeps "rotating" in the stack. If none of the N positions is good, then it returns failure to the previous queen. The previous queen will then rotate her own list and proceed to the next position.
For this to work, each queen needs to make her own duplicate of the list, which is kept also in the stack. So when you reach the last queen, the stack contains:
8 positions (1st queen) + 7 positions (2nd) + 6 positions (3rd) + 5 + 4 + 3 + 2 + 1 =36 numbers in the stack, plus the 8 selected positions of the queens = 44 stack levels needed (actually, 45 if you need to measure the speed, I leave the TICKS there too).
As is evident, the algorithm makes good use (and requires) the infinite stack from the 48/49/50 series (even the 28 should work).

Why so fast? Because even in the bitmap solution, each queen scans through the 8 positions, checking for a conflict in the bitmap. In this case, the last queen only has one possible position (all others were taken from the list) for which only has to check the diagonals. The second last only checks 2 possible positions, etc. This saves significant time.

I have to clean it up as well, but this one clocked at 29.8 seconds, a third of the original algorithm on the list, and half the time vs. the same algorithm using the stack.
Of course, this is not the same algorithm, so it's not directly comparable and cannot go in the list (although it follows the same solution path and explores the same nodes, producing the exact same solution). It's just to see how far we can go with this.

Claudio
10-26-2014, 06:18 AM
Post: #26
 Thomas Klemm Senior Member Posts: 1,447 Joined: Dec 2013
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
In this book a recursive backtracking algorithm is presented:

N. Wirth, Algorithms and Data Structures (1985 edition, updated for Oberon in August 2004)

3.5 The Eight Queens Problem

From the preface:
Quote:Recursion is shown to be a generalization of repetition (iteration), and as such it is an important and powerful concept in programming. In many programming tutorials, it is unfortunately exemplified by cases in which simple iteration would suffice. Instead, Chap. 3 concentrates on several examples of problems in which recursion allows for a most natural formulation of a solution, whereas use of iteration would lead to obscure and cumbersome programs.

Kind regards
Thomas
 « Next Oldest | Next Newest »

User(s) browsing this thread: 1 Guest(s)