Post Reply 
WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
10-23-2014, 05:20 PM
Post: #21
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
8 Queens on steroids Smile
Breaking news: a little teak dropped the runtime to 1.65s (runs 10 times in 165 ticks) which is about 4 times faster than the non bitmap program for the wp34s.

Code:
148     [f] LBL D       
149     [f] BASE 16     
150     WSIZE 32        
151     [h] CLx 
152     STO C           loop counter
153     STO I           Stack pointer
154     1       
155     0       
156     1       
157     0       
158     1       
159     0       
160     1       
161     STO A           0x1010101
162     8       
163     STO B           remaining Q
164     [h] MASKR 08    
165     [h] SL 01       
166     STO 00          0x1FE
                
167     [f] LBL 02      x=bitmap y=Q position
168     INC C   
169     [h] BS?->Y      test Q position
170     [h] GTO 05      Q OK, next line
                
171     [f] LBL 03      x=bitmap y=Q position
172     [h] DSZ Y       SHIFT QUEEN
173     [h] GTO 02      RETRY
174     INC B           otherwise backtrack
175     RCL->I          pop a Q
176     DEC I           pop a Q
177     RCL->I          pop a Q
178     DEC I           pop a Q
179     [h] GTO 03      
                
180     [f] LBL 05      x=bitmap y=Q position
181     INC I   
182     STO->I          Save bitmap
183     X<>Y    
184     INC I   
185     STO->I          save Q
186     X<>Y    
187     DEC B
188     RCL A           get Q mask                                
189     [h] RL->Z       shift                                     
190     [h] OR          merge with bitmap                         
191     RCL 00                                                    
192     [h] NOT                                                   
193     [h] AND         clear combination                         
194     [h] RR 09                                                 
195     [h] RRC 08      shift diagonal                            
196     RL 16                                                     
197     [h] RLC 08      shift diagonal                            
198     [h] RR 07                                                 
199     ENTER                                                     
200     ENTER                                                     
201     [h] RL 08                                                 
202     [h] OR          ORing L V R                               
203     ENTER                                                     
204     [h] RL 16                                                 
205     [h] OR          ORing L V R                               
206     RCL 00                                                    
207     [h] AND         mask Combination                          
208     RCL L                                                     
209     [h] XOR         negate Combination                        
210     [f] X=0?        if 0                                      
211     [h] GTO 06      deadend                                   
212     [h] OR          merge Combination to bitmap               
213     8                                                         
214     X<>Y                                                      
215     [h] GTO 02                                                
                                                               
216     [f] LBL 06      Shortcut: all queens locked on the line   
217     RCL B                                                     
218     X=0?            if B = 0                                  
219     GTO 04          finished                                  
220     8                                                         
221     STO+C                                                     
222     INC B           pop a queen                               
223     RCL->I                                                    
224     DEC I                                                     
225     RCL->I                                                    
226     DEC I                                                     
227     [h] GTO 03                                                
                                                               
228     [f] LBL 04                                                
229     RCL C                                                     
230     BASE 10                                                   
231    [g] RTN
Thanks IceMan for doing the timing
PS: there is still a few drops to squeeze by switching from LBL/GTO to BACK/SKIP.

Patrice
“Everything should be made as simple as possible, but no simpler.” Albert Einstein
Find all posts by this user
Quote this message in a reply
10-23-2014, 10:11 PM
Post: #22
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-22-2014 01:10 PM)Claudio L. Wrote:  30% seems a lot for anybody. When I watched your presentation, I understood that 30% was also including all your other optimizations. But each model will have a different impact. I'm with you, if the difference is 30%, it's not minor by any means. But on other models I don't think that it will be a big impact. If I remember correctly for the C/Regvars, the change in speed was less than 10%, and that included disabling all compiler optimizations, not just that indirection.

Quoting myself for reference here.
I did a few tweaks and created 50g RPL versions of the problem with the following conditions:

1) Original code
2) Stack code, using the values in the stack rather than a list
3) Original code with double indirection removed
4) Stack code, with double indirection removed
5) Bitmap code

I'll download and post the code for all cases over the weekend.
Right now let's go straight to the point. These results are on a 50g at normal speed:

1) Original code = 90 seconds
2) Stack code = 60.6 seconds
3) Original w/no indirection = 67.6 seconds
4) Stack code w/no indirection = 57.7 seconds
5) Bitmap code = 54.3 seconds

A few observations:
3 vs. 1: When using (slow) lists, removing that indirection has a big impact. The advantage comes not much from the read indirection, but mainly from not writing the value back to a list (which forces creation of an entire list every time) until the end of the inner loop.
2 vs 1: As I mentioned before: lists are slow. Using the stack provides a boost.
2 vs 3: When you remove the list creation from the inner loop, the effect of the lists is not as bad, so the difference is smaller.
4 vs 2: With lists out of the picture, the penalty for the indirection within the stack is minor (as it should be).
5 vs 2, or 5 vs 4: The Saturn CPU isn't particularly good at bit manipulations, so this technique is not such a massive improvement over the normal solution. Still edges faster.

Conclusions:
a) The effect of the indirection is large when using lists (PUT much more than GET).
b) The effect of the indirection removal is small when using the stack
c) The bitmap code is still faster than the normal approach, but not by much in this model.

Claudio
Find all posts by this user
Quote this message in a reply
10-24-2014, 11:32 AM
Post: #23
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
To Claudio: I can't wait to read about your work.

To Xerxes: After studying a few programs from your benchmark, I see that the programs fall into 3 categories:
1) programs that can find all the solutions AND quit nicely.
2) programs that can find all the solutions but fail after the last solution.
3) programs that can find only solutions with the first queen at position 8.

My bitmap programs fall into category 2.
The HP-15C program fall into category 3 Sad which means the programmer used the knowledge that the first queen is 8 and don't need to be moved to find the first solution.

Obviously, a category 3 program will be simpler and faster.
Is it in the spirit of the benchmark ?

Patrice
“Everything should be made as simple as possible, but no simpler.” Albert Einstein
Find all posts by this user
Quote this message in a reply
10-24-2014, 01:00 PM (This post was last modified: 10-24-2014 05:33 PM by xerxes.)
Post: #24
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-24-2014 11:32 AM)patrice Wrote:  To Xerxes: After studying a few programs from your benchmark, I see that the programs fall into 3 categories:
1) programs that can find all the solutions AND quit nicely.
2) programs that can find all the solutions but fail after the last solution.
3) programs that can find only solutions with the first queen at position 8.

The original program was designed to find the first solution only. Later I was forced to make a structured version,
due to the lack of GOTO in some languages, but it's the same way to the solution. The fact, that the unstructured version
is also able to find all solutions, is a side effect and was not intended.

I've tried to comprehend your observation about category 3 using the FX-603P, but I'm not able to see what you mean exactly.
The algorithm starts with putting the first queen at position 8, without the knowledge that it's not necessary to move
it later for the first solution only. Even if I put the queen at position 8 and start calculating at the second queen, it
doesn't change the execution time significantly.

I've also tried to start the calculation with the first queen in other positions without failing, but of course different execution times.

Calculator Benchmark
Find all posts by this user
Quote this message in a reply
10-26-2014, 01:55 AM
Post: #25
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-23-2014 10:11 PM)Claudio L. Wrote:  These results are on a 50g at normal speed:

1) Original code = 90 seconds
2) Stack code = 60.6 seconds
3) Original w/no indirection = 67.6 seconds
4) Stack code w/no indirection = 57.7 seconds
5) Bitmap code = 54.3 seconds

Add to that:
6) New in-stack recursive algorithm = < 30 seconds

Still working on cleaning up the sources for publication. I messed up and inadvertently overwrote a subroutine of my bitmap solution (was trying to improve it and STOred in the wrong name), so I have to see if I can write it back from scratch now.
But then I thought of starting an algorithm from scratch that uses the stack. The new algorithm uses the fact that queens can't share the same column, so it removes it from the possible locations of the next queens, being significantly faster than the bitmap solution.

The idea is: for the first queen, you have 8 possible locations (1 thru 8), which are numbers left in the stack. The first queen takes a number, leaving the other 7 positions in the stack, and recursively calls for the next queen to do the same. Of course, each queen takes a number from the list and verifies that doesn't conflict with any diagonals of previous queens (only the diagonals need to be checked since the column is guaranteed not to be occupied by another queen). If there's no conflict with the diagonals, recurses into the next queen, passing the remainder positions in the stack. If there is a conflict, it returns that position to the list (to the top) and pulls the next one.
For example, the second queen receives 1 2 3 4 5 6 7 (after the first queen takes the 8). Pulls a 7, checks that it's in the diagonal of the previous queen (in the 8), so it returns the 7 to the top of the list, leaving now 7 1 2 3 4 5 6, and now pulls the 6 and does the same. This list of available positions keeps "rotating" in the stack. If none of the N positions is good, then it returns failure to the previous queen. The previous queen will then rotate her own list and proceed to the next position.
For this to work, each queen needs to make her own duplicate of the list, which is kept also in the stack. So when you reach the last queen, the stack contains:
8 positions (1st queen) + 7 positions (2nd) + 6 positions (3rd) + 5 + 4 + 3 + 2 + 1 =36 numbers in the stack, plus the 8 selected positions of the queens = 44 stack levels needed (actually, 45 if you need to measure the speed, I leave the TICKS there too).
As is evident, the algorithm makes good use (and requires) the infinite stack from the 48/49/50 series (even the 28 should work).

Why so fast? Because even in the bitmap solution, each queen scans through the 8 positions, checking for a conflict in the bitmap. In this case, the last queen only has one possible position (all others were taken from the list) for which only has to check the diagonals. The second last only checks 2 possible positions, etc. This saves significant time.

I have to clean it up as well, but this one clocked at 29.8 seconds, a third of the original algorithm on the list, and half the time vs. the same algorithm using the stack.
Of course, this is not the same algorithm, so it's not directly comparable and cannot go in the list (although it follows the same solution path and explores the same nodes, producing the exact same solution). It's just to see how far we can go with this.

Claudio
Find all posts by this user
Quote this message in a reply
10-26-2014, 06:18 AM
Post: #26
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
In this book a recursive backtracking algorithm is presented:

N. Wirth, Algorithms and Data Structures (1985 edition, updated for Oberon in August 2004)

3.5 The Eight Queens Problem

From the preface:
Quote:Recursion is shown to be a generalization of repetition (iteration), and as such it is an important and powerful concept in programming. In many programming tutorials, it is unfortunately exemplified by cases in which simple iteration would suffice. Instead, Chap. 3 concentrates on several examples of problems in which recursion allows for a most natural formulation of a solution, whereas use of iteration would lead to obscure and cumbersome programs.

Kind regards
Thomas
Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 4 Guest(s)