Threaded Mode | Linear Mode

patrice · 10-23-2014, 05:20 PM

8 Queens on steroids Smile

Breaking news: a little teak dropped the runtime to 1.65s (runs 10 times in 165 ticks) which is about 4 times faster than the non bitmap program for the wp34s.

Code:

148     [f] LBL D       

149     [f] BASE 16     

150     WSIZE 32        

151     [h] CLx 

152     STO C           loop counter

153     STO I           Stack pointer

154     1       

155     0       

156     1       

157     0       

158     1       

159     0       

160     1       

161     STO A           0x1010101

162     8       

163     STO B           remaining Q

164     [h] MASKR 08    

165     [h] SL 01       

166     STO 00          0x1FE

167     [f] LBL 02      x=bitmap y=Q position

168     INC C   

169     [h] BS?->Y      test Q position

170     [h] GTO 05      Q OK, next line

171     [f] LBL 03      x=bitmap y=Q position

172     [h] DSZ Y       SHIFT QUEEN

173     [h] GTO 02      RETRY

174     INC B           otherwise backtrack

175     RCL->I          pop a Q

176     DEC I           pop a Q

177     RCL->I          pop a Q

178     DEC I           pop a Q

179     [h] GTO 03      

180     [f] LBL 05      x=bitmap y=Q position

181     INC I   

182     STO->I          Save bitmap

183     X<>Y    

184     INC I   

185     STO->I          save Q

186     X<>Y    

187     DEC B

188     RCL A           get Q mask                                

189     [h] RL->Z       shift                                     

190     [h] OR          merge with bitmap                         

191     RCL 00                                                    

192     [h] NOT                                                   

193     [h] AND         clear combination                         

194     [h] RR 09                                                 

195     [h] RRC 08      shift diagonal                            

196     RL 16                                                     

197     [h] RLC 08      shift diagonal                            

198     [h] RR 07                                                 

199     ENTER                                                     

200     ENTER                                                     

201     [h] RL 08                                                 

202     [h] OR          ORing L V R                               

203     ENTER                                                     

204     [h] RL 16                                                 

205     [h] OR          ORing L V R                               

206     RCL 00                                                    

207     [h] AND         mask Combination                          

208     RCL L                                                     

209     [h] XOR         negate Combination                        

210     [f] X=0?        if 0                                      

211     [h] GTO 06      deadend                                   

212     [h] OR          merge Combination to bitmap               

213     8                                                         

214     X<>Y                                                      

215     [h] GTO 02                                                

216     [f] LBL 06      Shortcut: all queens locked on the line   

217     RCL B                                                     

218     X=0?            if B = 0                                  

219     GTO 04          finished                                  

220     8                                                         

221     STO+C                                                     

222     INC B           pop a queen                               

223     RCL->I                                                    

224     DEC I                                                     

225     RCL->I                                                    

226     DEC I                                                     

227     [h] GTO 03                                                

228     [f] LBL 04                                                

229     RCL C                                                     

230     BASE 10                                                   

231    [g] RTN

Thanks IceMan for doing the timing
PS: there is still a few drops to squeeze by switching from LBL/GTO to BACK/SKIP.

Claudio L. · 10-23-2014, 10:11 PM

(10-22-2014 01:10 PM)Claudio L. Wrote: 30% seems a lot for anybody. When I watched your presentation, I understood that 30% was also including all your other optimizations. But each model will have a different impact. I'm with you, if the difference is 30%, it's not minor by any means. But on other models I don't think that it will be a big impact. If I remember correctly for the C/Regvars, the change in speed was less than 10%, and that included disabling all compiler optimizations, not just that indirection.

Quoting myself for reference here.
I did a few tweaks and created 50g RPL versions of the problem with the following conditions:

1) Original code
2) Stack code, using the values in the stack rather than a list
3) Original code with double indirection removed
4) Stack code, with double indirection removed
5) Bitmap code

I'll download and post the code for all cases over the weekend.
Right now let's go straight to the point. These results are on a 50g at normal speed:

1) Original code = 90 seconds
2) Stack code = 60.6 seconds
3) Original w/no indirection = 67.6 seconds
4) Stack code w/no indirection = 57.7 seconds
5) Bitmap code = 54.3 seconds

A few observations:
3 vs. 1: When using (slow) lists, removing that indirection has a big impact. The advantage comes not much from the read indirection, but mainly from not writing the value back to a list (which forces creation of an entire list every time) until the end of the inner loop.
2 vs 1: As I mentioned before: lists are slow. Using the stack provides a boost.
2 vs 3: When you remove the list creation from the inner loop, the effect of the lists is not as bad, so the difference is smaller.
4 vs 2: With lists out of the picture, the penalty for the indirection within the stack is minor (as it should be).
5 vs 2, or 5 vs 4: The Saturn CPU isn't particularly good at bit manipulations, so this technique is not such a massive improvement over the normal solution. Still edges faster.

Conclusions:
a) The effect of the indirection is large when using lists (PUT much more than GET).
b) The effect of the indirection removal is small when using the stack
c) The bitmap code is still faster than the normal approach, but not by much in this model.

Claudio

patrice · 10-24-2014, 11:32 AM

To Claudio: I can't wait to read about your work.

To Xerxes: After studying a few programs from your benchmark, I see that the programs fall into 3 categories:
1) programs that can find all the solutions AND quit nicely.
2) programs that can find all the solutions but fail after the last solution.
3) programs that can find only solutions with the first queen at position 8.

My bitmap programs fall into category 2.
The HP-15C program fall into category 3 Sad

which means the programmer used the knowledge that the first queen is 8 and don't need to be moved to find the first solution.

Obviously, a category 3 program will be simpler and faster.
Is it in the spirit of the benchmark ?

xerxes · (This post was last modified: 10-24-2014 05:33 PM by xerxes.)

(10-24-2014 11:32 AM)patrice Wrote: To Xerxes: After studying a few programs from your benchmark, I see that the programs fall into 3 categories:
1) programs that can find all the solutions AND quit nicely.
2) programs that can find all the solutions but fail after the last solution.
3) programs that can find only solutions with the first queen at position 8.

The original program was designed to find the first solution only. Later I was forced to make a structured version,
due to the lack of GOTO in some languages, but it's the same way to the solution. The fact, that the unstructured version
is also able to find all solutions, is a side effect and was not intended.

I've tried to comprehend your observation about category 3 using the FX-603P, but I'm not able to see what you mean exactly.
The algorithm starts with putting the first queen at position 8, without the knowledge that it's not necessary to move
it later for the first solution only. Even if I put the queen at position 8 and start calculating at the second queen, it
doesn't change the execution time significantly.

I've also tried to start the calculation with the first queen in other positions without failing, but of course different execution times.

Claudio L. · 10-26-2014, 01:55 AM

(10-23-2014 10:11 PM)Claudio L. Wrote: These results are on a 50g at normal speed:

1) Original code = 90 seconds
2) Stack code = 60.6 seconds
3) Original w/no indirection = 67.6 seconds
4) Stack code w/no indirection = 57.7 seconds
5) Bitmap code = 54.3 seconds

Add to that:
6) New in-stack recursive algorithm = < 30 seconds

Still working on cleaning up the sources for publication. I messed up and inadvertently overwrote a subroutine of my bitmap solution (was trying to improve it and STOred in the wrong name), so I have to see if I can write it back from scratch now.
But then I thought of starting an algorithm from scratch that uses the stack. The new algorithm uses the fact that queens can't share the same column, so it removes it from the possible locations of the next queens, being significantly faster than the bitmap solution.

The idea is: for the first queen, you have 8 possible locations (1 thru 8), which are numbers left in the stack. The first queen takes a number, leaving the other 7 positions in the stack, and recursively calls for the next queen to do the same. Of course, each queen takes a number from the list and verifies that doesn't conflict with any diagonals of previous queens (only the diagonals need to be checked since the column is guaranteed not to be occupied by another queen). If there's no conflict with the diagonals, recurses into the next queen, passing the remainder positions in the stack. If there is a conflict, it returns that position to the list (to the top) and pulls the next one.
For example, the second queen receives 1 2 3 4 5 6 7 (after the first queen takes the 8). Pulls a 7, checks that it's in the diagonal of the previous queen (in the 8), so it returns the 7 to the top of the list, leaving now 7 1 2 3 4 5 6, and now pulls the 6 and does the same. This list of available positions keeps "rotating" in the stack. If none of the N positions is good, then it returns failure to the previous queen. The previous queen will then rotate her own list and proceed to the next position.
For this to work, each queen needs to make her own duplicate of the list, which is kept also in the stack. So when you reach the last queen, the stack contains:
8 positions (1st queen) + 7 positions (2nd) + 6 positions (3rd) + 5 + 4 + 3 + 2 + 1 =36 numbers in the stack, plus the 8 selected positions of the queens = 44 stack levels needed (actually, 45 if you need to measure the speed, I leave the TICKS there too).
As is evident, the algorithm makes good use (and requires) the infinite stack from the 48/49/50 series (even the 28 should work).

Why so fast? Because even in the bitmap solution, each queen scans through the 8 positions, checking for a conflict in the bitmap. In this case, the last queen only has one possible position (all others were taken from the list) for which only has to check the diagonals. The second last only checks 2 possible positions, etc. This saves significant time.

I have to clean it up as well, but this one clocked at 29.8 seconds, a third of the original algorithm on the list, and half the time vs. the same algorithm using the stack.
Of course, this is not the same algorithm, so it's not directly comparable and cannot go in the list (although it follows the same solution path and explores the same nodes, producing the exact same solution). It's just to see how far we can go with this.

Claudio

Thomas Klemm · 10-26-2014, 06:18 AM

In this book a recursive backtracking algorithm is presented:

N. Wirth, Algorithms and Data Structures (1985 edition, updated for Oberon in August 2004)

3.5 The Eight Queens Problem

From the preface:

Quote:Recursion is shown to be a generalization of repetition (iteration), and as such it is an important and powerful concept in programming. In many programming tutorials, it is unfortunately exemplified by cases in which simple iteration would suffice. Instead, Chap. 3 concentrates on several examples of problems in which recursion allows for a most natural formulation of a solution, whereas use of iteration would lead to obscure and cumbersome programs.

Kind regards
Thomas