Loading [MathJax]/extensions/Safe.js


Post Reply 
WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
10-12-2014, 11:38 PM (This post was last modified: 10-15-2014 11:09 PM by iceman.)
Post: #1
WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
First of all, thanks to forum member Patrice for sharing a work in progress (His source code for the HP-16C N-Queen's Benchmark).

I have ported this code to the bit commands of the WP-34S which has resulted in doubling the speed of the N-Queens solution over the traditional WP-34S N-Queens benchmark.

Set Wordsize to 32.

Here is the code, starts at memory location 32. I also attach a .dat file so that this code can be loaded into the emulator:

Code:
    WP-34S    
============    
32    [f] LBL C        
33    [f] BASE 16        
34    3        
35    STO I        
36    [h] CLx        
37    STO 00    //Count    
38    1        
39    0        
40    1        
41    0        
42    1        
43    0        
44    1        
45    STO 02    //Queen    
46    8        
47    STO 01    //Line    
48    [h] MASKR->X        
49    X~Y        //Mask width specifier got pushed onto stack above the mask
50    R Dn        //Get rid of it
51    [h] SL 01        
52    STO -> I    //Mask    
53    8        
54    [h] X~I        
55    X~Y        
56    [f] LBL 02    //Byte to be tested is in Y.  Bit to test ins in I
57    INC  00    //Count    First increment the line counter        
58    [h] BS?->I        
59    [h] GTO 05                    
60    [f] LBL 03        
61    [h] DSZ I        
62    [h] GTO 02        
63    X~Y        
64    [h] X~I                    
65    INC  01    //Line    
66    [h] DSZ I        
67    RCL->I        
68    [h] DSZ I        
69    RCL->I        
70    X~Y        
71    [h] X~I        
72    X~Y        
73    [h] GTO 03                
74    [f] LBL 05        
75    X~Y        
76    [h] X~I        
77    RCL 02            //Queen    
78    X~Y        
79    [h] ISZ I        
80    STO->I        
81    STO J        //Save the rotation count in the "TEST" register
82    R Dn                //Get rid of it, and pull the rotation candidate into X
83    [h] RL->J        //Rotate by the rotation count. TODO might be better way to do this: RL -> -> I Then could drop two lines above
84    [h] OR                
85    DEC 01             //Line.    Note: Might be better way to do this like: DSZ
86    RCL 01        
87    [f] x=0?            //Are we done?    
88    [h] GTO 04    //Yes Done: Terminate    
89    R Dn                    //No, Not done    
90    RCL 03              //Mask    
91    [h] NOT        
92    [h] AND                
93    [h] RR 09                
94    [h] RRC 08        
95    [h] RL 16        
96    [h] RLC 08            
97    [h] RR 07        
98    ENTER        
99    ENTER        
100    ENTER                
101    [h] RL 08        
102    [h] OR        
103    ENTER                    
104    [h] RL 16        
105    [h] OR        
106    RCL 03    //Mask    
107    [h] AND        
108    RCL L        
109    [h] XOR        
110    [f] X=0?        
111    [h] GTO 06        
112    [h] OR        
113    [h] ISZ I        
114    STO->I        
115    8        
116    [h] X~I        
117    X~Y        
118    [h] GTO 02                
119    [f] LBL 06                
120    8                
121    STO + 00    //Count.    Increment count by 8
122    INC 01    //Line    Increment Line count by 1
123    RCL->I        
124    [h] DSZ I        
125    RCL->I        
126    X~Y        
127    [h] X~I        
128    X~Y        
129    [h] GTO 03                
130    [f] LBL 04        
131    RCL 00    //Count    
132    [g] RTN    //Solution: 8 4 1 3 6 2 7 5 In every second register above 3


Attached File(s)
.txt  wp34s.dat.txt (Size: 2 KB / Downloads: 6)
Find all posts by this user
Quote this message in a reply
10-14-2014, 05:17 AM
Post: #2
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
Hi Tim,

Thank you for your request about this program, it was fun to comeback on it.

Patrice
“Everything should be made as simple as possible, but no simpler.” Albert Einstein
Find all posts by this user
Quote this message in a reply
10-15-2014, 06:39 PM (This post was last modified: 10-15-2014 11:07 PM by iceman.)
Post: #3
IMPROVED 30% over previous post
Patrice Torchet has optimized the WP-34S benchmark posted above.

Performance statistics using WP34S Firmware version 3.3 on HP-30B Hardware for the "Traditional" WP-34S NQueens Benchmark (See Xerxes Article)

WP-34S "Traditional" mode: [f] H.d (DECM):------------------------>68 Ticks
WP-34S "Traditional" mode: BASE 10 : -----------------------------> 61 Ticks

Performance statistics using WP34S Firmware version 3.3 on HP-30B Hardware for the code below are as follows:

Torchet WP-34S Optimized:-------------------------------------------->21 Ticks

There are approximately 10 Ticks per second

Here is the optimized code:

Code:
140    [f] LBL C        
141    TICKS        
142    STO K        
143    [f] BASE 16        
144    WSIZE 32        
145    [h] CLx        
146    STO C    //loop counter    
147    STO I    //Stack pointer    
148    1        
149    0        
150    1        
151    0        
152    1        
153    0        
154    1        
155    STO A    //0x1010101    
156    8        
157    STO B    //remaining Q    
158    [h] MASKR 08        
159    [h] SL 01        
160    STO 00    //0x1FE    
            
161    [f] LBL 02    //x=bitmap y=Q position    
162    INC C        
163    [h] BS?->Y    //test Q position    
164    [h] GTO 05    //Q OK, next line    
            
165    [f] LBL 03    //x=bitmap y=Q position    
166    [h] DSZ Y    //SHIFT QUEEN    
167    [h] GTO 02    //RETRY    
168    INC B    //ootherwise backtrack    
169    RCL->I    //pop a Q    
170    DEC I    //pop a Q    
171    RCL->I    //pop a Q    
172    DEC I    //pop a Q    
173    [h] GTO 03        
            
174    [f] LBL 05    //x=bitmap y=Q position    
175    INC I        
176    STO->I    //Save bitmap    
177    X<>Y        
178    INC I        
179    STO->I    //save Q    
180    X<>Y        
181    DSZ B    //Finish if 0    
182    [h] GTO 01        
183    [h] GTO 04        
            
184    LBL 01        
185    RCL A    //get Q mask    
186    [h] RL->Z    //shift    
187    [h] OR    //merge with bitmap    
188    RCL 00        
189    [h] NOT        
190    [h] AND    //clear combination    
191    [h] RR 09        
192    [h] RRC 08    //shift diagonal    
193    RL 16        
194    [h] RLC 08    //shift diagonal    
195    [h] RR 07        
196    ENTER        
197    ENTER        
198    [h] RL 08        
199    [h] OR    //ORing L V R    
200    ENTER        
201    [h] RL 16        
202    [h] OR    //ORing L V R    
203    RCL 00        
204    [h] AND    //mask Combination    
205    RCL L        
206    [h] XOR    //negate Combination    
207    [f] X=0?    //if 0    
208    [h] GTO 06    //deadend    
209    [h] OR    //merge Combination to bitmap    
210    8        
211    X<>Y        
212    [h] GTO 02        
            
213    [f] LBL 06    //Shortcut    
214    8        
215    STO+C        
216    INC B        
217    RCL->I        
218    DEC I        
219    RCL->I        
220    DEC I        
221    [h] GTO 03        
            
222    [f] LBL 04        
223    RCL C        
224    BASE 10        
225    TICKS        
226    RCL K        
227    -        
228    STO K        
229    R Dn        
230    [g] RTN        
            
bitmap principle            
the bitmap contains 4 parts of 8 bits            
L    Left diagonal locks        
V    Vertical locks        
R    Right diagonal locks        
C    Combination    = L or V or R    This is the part tested (BS?)
            
LLLLLLLVVVVVVVVRRRRRRRRCCCCCCCCL            
register A            
00000001000000010000000100000001            
            
bitmap is rotated to have combination on bits 1 to 8            
register A is rotated too to match bitmap


Merci Patrice, it is a work of beauty!
Find all posts by this user
Quote this message in a reply
10-16-2014, 01:01 PM
Post: #4
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
Just for completeness.
The Queens positions are in registers 2, 4, 6, 8 ....

Patrice
“Everything should be made as simple as possible, but no simpler.” Albert Einstein
Find all posts by this user
Quote this message in a reply
10-16-2014, 04:30 PM
Post: #5
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-15-2014 06:39 PM)iceman Wrote:  Performance statistics using WP34S Firmware version 3.3 on HP-30B Hardware for the "Traditional" WP-34S NQueens Benchmark (See Xerxes Article)

WP-34S "Traditional" mode: [f] H.d (DECM):------------------------>68 Ticks
WP-34S "Traditional" mode: BASE 10 : -----------------------------> 61 Ticks

Thank you for sharing your test results. I've updated the list.

Calculator Benchmark
Find all posts by this user
Quote this message in a reply
10-16-2014, 06:59 PM
Post: #6
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
Is it still the same algorithm?
I thought the idea was to have the same algorithm ported in all platforms, so they can be directly compared.
I'm sure other calculators will also get a boost if you change the algorithm. I didn't look at the new code in detail, but looks to me like a completely different algorithm.

Is this sufficiently close to the original algorithm so it can be included in the original list? Otherwise will introduce noise into an otherwise very neat collection of results.
Find all posts by this user
Quote this message in a reply
10-16-2014, 07:05 PM
Post: #7
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
Greetings Xerxes!

Thank-you so much for monitoring this discussion and incorporating the findings into your excellent benchmark article.

May I respectfully recommend that the new BITMAP based N-Queens benchmark for WP-34S also be included in article 700, along with the most remarkable test result of 2.1 seconds on F/W version 3.3?

As you can determine from the attached photo, this benchmark developed by forum member Patrice correctly yields the Queen's positions in registers 4, 6, 8, 10, 12, 14, 16 and 18. This benchmark uses the WP-34S Binary operations commands (a-la HP-16C) and so represents a second legitimate example of the WP-34S N-Queens benchmark against the HP-16C. The original method is still relevant of course when comparing calculators that do not support bit functions, so I recommend including both in your article. Smile

Here is the code for Patrice Torchet's BITMAP version:

Code:
WP-34S BITMAP Based N-Queens  (Patrice Torchet)
140    [f] LBL C        
141    TICKS        
142    STO K        
143    [f] BASE 16        
144    WSIZE 32        
145    [h] CLx        
146    STO C    //loop counter    
147    STO I    //Stack pointer    
148    1        
149    0        
150    1        
151    0        
152    1        
153    0        
154    1        
155    STO A    //0x1010101    
156    8        
157    STO B    //remaining Q    
158    [h] MASKR 08        
159    [h] SL 01        
160    STO 00    //0x1FE    
            
161    [f] LBL 02    //x=bitmap y=Q position    
162    INC C        
163    [h] BS?->Y    //test Q position    
164    [h] GTO 05    //Q OK, next line    
            
165    [f] LBL 03    //x=bitmap y=Q position    
166    [h] DSZ Y    //SHIFT QUEEN    
167    [h] GTO 02    //RETRY    
168    INC B    //ootherwise backtrack    
169    RCL->I    //pop a Q    
170    DEC I    //pop a Q    
171    RCL->I    //pop a Q    
172    DEC I    //pop a Q    
173    [h] GTO 03        
            
174    [f] LBL 05    //x=bitmap y=Q position    
175    INC I        
176    STO->I    //Save bitmap    
177    X<>Y        
178    INC I        
179    STO->I    //save Q    
180    X<>Y        
181    DSZ B    //Finish if 0    
182    [h] GTO 01        
183    [h] GTO 04        
            
184    LBL 01        
185    RCL A    //get Q mask    
186    [h] RL->Z    //shift    
187    [h] OR    //merge with bitmap    
188    RCL 00        
189    [h] NOT        
190    [h] AND    //clear combination    
191    [h] RR 09        
192    [h] RRC 08    //shift diagonal    
193    RL 16        
194    [h] RLC 08    //shift diagonal    
195    [h] RR 07        
196    ENTER        
197    ENTER        
198    [h] RL 08        
199    [h] OR    //ORing L V R    
200    ENTER        
201    [h] RL 16        
202    [h] OR    //ORing L V R    
203    RCL 00        
204    [h] AND    //mask Combination    
205    RCL L        
206    [h] XOR    //negate Combination    
207    [f] X=0?    //if 0    
208    [h] GTO 06    //deadend    
209    [h] OR    //merge Combination to bitmap    
210    8        
211    X<>Y        
212    [h] GTO 02        
            
213    [f] LBL 06    //Shortcut    
214    8        
215    STO+C        
216    INC B        
217    RCL->I        
218    DEC I        
219    RCL->I        
220    DEC I        
221    [h] GTO 03        
            
222    [f] LBL 04        
223    RCL C        
224    BASE 10        
225    TICKS        
226    RCL K        
227    -        
228    STO K        
229    R Dn        
230    [g] RTN        
            
bitmap principle            
the bitmap contains 4 parts of 8 bits            
L    Left diagonal locks        
V    Vertical locks        
R    Right diagonal locks        
C    Combination    = L or V or R    This is the part tested (BS?)
            
LLLLLLLVVVVVVVVRRRRRRRRCCCCCCCCL            
register A            
00000001000000010000000100000001            
            
bitmap is rotated to have combination on bits 1 to 8            
register A is rotated too to match bitmap
Queen's positions in registers 4, 6, 8, 10, 12, 14, 16 and 18

P.S. There is also an HP-16C version of this benchmark if you are interested, I am sure that Patrice would be willing to share it.


Attached File(s) Thumbnail(s)
   
Find all posts by this user
Quote this message in a reply
10-16-2014, 08:53 PM
Post: #8
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-16-2014 06:59 PM)Claudio L. Wrote:  Is it still the same algorithm?
It is the same in the sense that it apply the same rules and find a solution.
It is close enough to end by displaying the loop counter with the expected value.

Patrice
“Everything should be made as simple as possible, but no simpler.” Albert Einstein
Find all posts by this user
Quote this message in a reply
10-17-2014, 08:02 PM
Post: #9
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-16-2014 06:59 PM)Claudio L. Wrote:  Is it still the same algorithm?
I thought the idea was to have the same algorithm ported in all platforms, so they can be directly compared.
I'm sure other calculators will also get a boost if you change the algorithm. I didn't look at the new code in detail, but looks to me like a completely different algorithm.

Is this sufficiently close to the original algorithm so it can be included in the original list? Otherwise will introduce noise into an otherwise very neat collection of results.
There is a problem with the benchmark, the way it is done. It is biaised in more than one way from the beginning.

1) Algorithm chosen: is it not the most powerful. It prevent some calculators from showing their full potential. Ex: hp16C wp34s

2)THe reference source code (BASIC) is poorly written. It use a double array access in the inner loop when a single array access is enough.
- Problem, C code is allowed and C compilers are smart enough to detect that one of the array access is a constant inside the loop and optimize it. Is it fair ?
- Problem, C/RegVars code is directly using a single array access, it is an advantage. Is it fair ?
- Problem, calculators are features poor. hp15c, hp16C and wp34S and many others show poor performance because they stick to the BASIC code and spend their time swaping array pointers in the innier loop. Is it fair ?

You have seen that the wp34s can be 3 times faster, that is huge. Is it fair not to show it in the benchmark?

Every program using the double array access should be rewriten to single array access, just to be fair with C/RegVars code.

Patrice
“Everything should be made as simple as possible, but no simpler.” Albert Einstein
Find all posts by this user
Quote this message in a reply
10-18-2014, 01:36 AM
Post: #10
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
Accurate benchmarking seems to be a very hard task. I choosed the n-queens problem with a simple algorithm, to be able
to compare the basic programming elements of the languages of as many calculators as possible, even if the programming
capabilities are limited. Of course there are much better algorithms to solve this puzzle, but these are not suited
for many calculators.

I think that in most cases the translation of the original BASIC code is accurate enough, because the speed difference
between one array access or two is very small, but as Patrice mentioned, I also see a disadvantage for some RPN versions,
because there is only one register for indirect addressing unlike the WP-34S, that makes the inner loop more complicated
than neccessary. So a RPN friendly translation of the code would be more fair.

If I remember correctly Patrice already tried this with about 30% speed advantage compared to the actual RPN versions.
IMHO using this version would be really fair, because it's RPN friendly but also close to the original algorithm. Too
bad that we haven't used this version from the beginning.

I like the BITMAP solution a lot, but how can I use it in the benchmark list without putting many other calculators at
a disadvantage? Please take into consideration that the BITMAP version would be also much faster on some other non HP
calculators and the comparability of the results will suffer, due to the use of different algorithms.

The BITMAP solution has earned an own article.

Calculator Benchmark
Find all posts by this user
Quote this message in a reply
10-19-2014, 05:35 PM
Post: #11
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
Hi XerXes,
I didn't want to bother you with all this, as I have no time to conduct a benchmark by myself. The simple knowledge of what was possible was enough for me.

At best I can give a hand to help optimize a program.

For thoses interested in seeing how bad I speak english Smile , here is a présentation I gave in HHC 2011 https://www.youtube.com/watch?v=g0sQA0H6...SU30gorNFw about what should nit be done and about bitmap.
and here is the slide I used HHC 2011 HP15C Benchmark.ppt

Patrice
“Everything should be made as simple as possible, but no simpler.” Albert Einstein
Find all posts by this user
Quote this message in a reply
10-20-2014, 01:07 PM
Post: #12
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-17-2014 08:02 PM)patrice Wrote:  There is a problem with the benchmark, the way it is done. It is biaised in more than one way from the beginning.

1) Algorithm chosen: is it not the most powerful. It prevent some calculators from showing their full potential. Ex: hp16C wp34s

2)THe reference source code (BASIC) is poorly written. It use a double array access in the inner loop when a single array access is enough.
- Problem, C code is allowed and C compilers are smart enough to detect that one of the array access is a constant inside the loop and optimize it. Is it fair ?
- Problem, C/RegVars code is directly using a single array access, it is an advantage. Is it fair ?
I think the array access issue is minor, and doesn't fundamentally change the algorithm (you still have to explore the same number of nodes and do exactly the same tasks.
And if a particular language has the ability to do optimization and use register variables with a simple statement or compiler switch, I personally think it's fair to include it as long as it is marked as such (Xerxes did a great job at that). Same thing with the CPU clock, turbo modes, etc. Many implementations are optimized for the particular language/calculator, but they don't change the algorithm. They solve the same problem through the same path to a solution.

(10-17-2014 08:02 PM)patrice Wrote:  - Problem, calculators are features poor. hp15c, hp16C and wp34S and many others show poor performance because they stick to the BASIC code and spend their time swaping array pointers in the innier loop. Is it fair ?

You have seen that the wp34s can be 3 times faster, that is huge. Is it fair not to show it in the benchmark?

Every program using the double array access should be rewriten to single array access, just to be fair with C/RegVars code.

Perhaps they should all use it, but that will have a relatively small impact, and requires extra storage to "cache" that value. I'd say if a calculator has extra registers and can do it, why not. Having extra registers available is a feature of the calculator and is representative of what the calculator can do.
Bit manipulations, on the other hand, is a very specialized subset of features that not all calculators have and is not representative of what the calculator can do at solving problems.
I'd think a lot of problems will require arrays and indexes, but not so many can be solved using bit manipulations, so it gives a better idea of what to expect from a particular calculator/language combination in the real world.
For fun, I coded a non-optimized bitmap solution (no idea if it's similar to yours, I made it up from scratch) in userRPL on the 50g, and it runs in 75 seconds instead of the 90 from the list. It's split in several routines and does heavy use of subroutine calls done by name (ugly and slow), so it's not exactly the fastest it could be, but enough to prove my point: many other calculators could benefit greatly from a bitmap approach, even the assembly and C versions could get a boost from that. Including your solution in the list would mean we have to throw away all other results and run a bitmap solution on all of them.
It's not really about being fair/unfair, it's a question of it being comparable or not to the others, and of it being representative of other problems.
I think you should open a new thread and call it the "N-Queens Calculator Competition", (emphasis on "competition" versus "benchmark") where we don't care about comparison or real-world usage, just get the job done with the fastest algorithm optimized the best you can.
I think a lot of people will start coding their own algorithms and optimizing them for different machines. It could be fun.

Claudio
Find all posts by this user
Quote this message in a reply
10-20-2014, 10:05 PM
Post: #13
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-18-2014 01:36 AM)xerxes Wrote:  ... I also see a disadvantage for some RPN versions,
because there is only one register for indirect addressing unlike the WP-34S, that makes the inner loop more complicated
than neccessary. So a RPN friendly translation of the code would be more fair.

And don't forget that the userRPL version for the HP28/48/49/50 is using lists, which is known to be very slow. It could very well leave all 8 values on the stack and use PICK/UNPICK instead of GET/PUT, taking advantage of the unlimited stack that is the most distinctive feature of this calculator series.
But people trying to solve a problem would probably use lists, so it is representative of the normal use. For the same reason people wouldn't normally use a bitmap solution, or be concerned with the number of indirect addressing done in a loop (making the BASIC version and its port for all targets also representative of normal use).
On the other hand, people using C would instantly get the C/regvars optimization without knowing, so it is representative as well.
I think your article is the most serious attempt ever made to have a benchmark for calculators, and wouldn't call it flawed in any way. For a cross-language, cross-platform benchmark with so many different hardware targets, it's nearly impossible to track every optimization on every target.
The fact that in 2014 people are still talking about it means overall it's a great benchmark.
I used it myself to compare newRPL speed to traditional RPL, so it's actually a development tool too.
Find all posts by this user
Quote this message in a reply
10-21-2014, 09:38 PM
Post: #14
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-20-2014 10:05 PM)Claudio L. Wrote:  And don't forget that the userRPL version for the HP28/48/49/50 is using lists, which is known to be very slow. It could very well leave all 8 values on the stack and use PICK/UNPICK instead of GET/PUT, taking advantage of the unlimited stack that is the most distinctive feature of this calculator series.

I didn't noticed that before, thank you for your advice and kind words.

Calculator Benchmark
Find all posts by this user
Quote this message in a reply
10-21-2014, 09:49 PM
Post: #15
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-18-2014 01:36 AM)xerxes Wrote:  Accurate benchmarking seems to be a very hard task.
Indeed Smile , That's why I didn't try to do a benchmark by myself.

(10-18-2014 01:36 AM)xerxes Wrote:  I like the BITMAP solution a lot, but how can I use it in the benchmark list without putting many other calculators at
a disadvantage? Please take into consideration that the BITMAP version would be also much faster on some other non HP
calculators and the comparability of the results will suffer, due to the use of different algorithms.
It's like doing a benchmark/contest with vehicles from point A to point B, most of the cars will take the road.
And when you test a 4WD, why take the bumpy track shortcut should not be allowed?
And finally when you put you hands on an ugly amphibious vehicle, why shouldn't it just cross the river ?
It is slow and ugly but it can do things that a Ferrari can't.

(10-18-2014 01:36 AM)xerxes Wrote:  The BITMAP solution has earned an own article.
Is it on line already ? URL ?

I propose a set of rules where I take into account that calculators can be poor in resources.

- make a program that build a solution from the rules. You are free to choose any algorithm as long as it solve the problem.
- the result is the solution found as a number of 8 figures. it is valid as long as it is a solution.
- Ideally, after displaying a solution, a simple branch in the program will lead to a second solution
- do not count the number of steps
- the goal is to be as fast as possible.

Patrice
“Everything should be made as simple as possible, but no simpler.” Albert Einstein
Find all posts by this user
Quote this message in a reply
10-21-2014, 10:51 PM
Post: #16
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
Hi Claudio,
(10-20-2014 01:07 PM)Claudio L. Wrote:  I think the array access issue is minor, and doesn't fundamentally change the algorithm (you still have to explore the same number of nodes and do exactly the same tasks.
Take the HP-15C: The single indirection swap in the inner loop imply a mostly 33% overhead. For you, I don't know, but for me, it is NOT MINOR.

(10-20-2014 01:07 PM)Claudio L. Wrote:  I think you should open a new thread and call it the "N-Queens Calculator Competition", (emphasis on "competition" versus "benchmark") where we don't care about comparison or real-world usage, just get the job done with the fastest algorithm optimized the best you can.
It just append that a benchmark IS a competition about being the fastest

Here is an optimized code for the 15C: the algorithm is exactly the same as in the benchmark. It simply use available features more efficiently.
Code:
42,21,11    LBL A
   42 34    CLEAR REG
       8    8
   44 .0    STO .0        
   44  1    STO 1        
42, 6,.1    ISG .1        
   43  7    DEG
   44  8    STO 8
42, 6, 0    ISG 0        
   
42,21, 1    LBL 1        
   45  0    RCL 0
   44 25    STO I
           STO 9
           ISG 9
           DEG
42, 6,.1    ISG .1        

42,21, 2    LBL 2        
   45  8    RCL 8
45,30,24    RCL - (i)
   43 20    x=0?
   22  3    GTO 3        
   43 16    ABS
   45  9    RCL 9
45,30,25    RCL - I
43,30, 5    TEST 5        
   22  3    GTO 3        

42, 5,25    DSE I        
   22  2    GTO 2
42, 6, 0    ISG 0        
   43  7    DEG
   45  0    RCL 0        
   44 25    STO I
   45 .0    RCL .0
43,30, 5    TEST 5        
   22  4    GTO 4        
   
42, 4,24    X<> 8        
        STO (I)
   22  0    GTO 1

42,21, 3    LBL 3        
42, 5, 8    DSE 8        
   22  1    GTO 1
   45  0    RCL 0
   44 25    STO I
   45 24    RCL (I)
   44  8    STO 8
42, 5, 0    DSE 0        
   22  3    GTO 3
42, 5, 1    DSE 1        
42, 6, 0    ISG 0
   43  7    DEG
42, 6,.1    ISG .1        
   43  7    DEG
   22  0    GTO 1
   
42,21, 4    LBL 4        
   45 .0    RCL .0
   44 25    STO I
   43 35    CLX

42,21, 6    LBL 6        
       1    1
       0    0
      20    *
45,40,24    RCL + (I)
42, 5,25    DSE I
   22  6    GTO 6
   45 .1    RCL .1
   43 32    RTN

registres
0    line poiunter
1-8    queens
.0    Size
.1    counter
it saves about 35% of runtime. It don't think it is minor.

Patrice
“Everything should be made as simple as possible, but no simpler.” Albert Einstein
Find all posts by this user
Quote this message in a reply
10-22-2014, 01:10 PM
Post: #17
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-21-2014 10:51 PM)patrice Wrote:  Take the HP-15C: The single indirection swap in the inner loop imply a mostly 33% overhead. For you, I don't know, but for me, it is NOT MINOR.
30% seems a lot for anybody. When I watched your presentation, I understood that 30% was also including all your other optimizations. But each model will have a different impact. I'm with you, if the difference is 30%, it's not minor by any means. But on other models I don't think that it will be a big impact. If I remember correctly for the C/Regvars, the change in speed was less than 10%, and that included disabling all compiler optimizations, not just that indirection.

(10-21-2014 10:51 PM)patrice Wrote:  It just append that a benchmark IS a competition about being the fastest
What I meant is the goals are different. The objective of the benchmark is not to compete, but to measure and compare. People try to compete (naturally), but that's not what the benchmark is about.
On the benchmark you try to eliminate the user factor (to compare machines). On the competition the user factor is all that matters (it's who does it faster).
For newRPL, for example, I'm not trying to "beat" any particular measures or compete against anything or anybody, yet I'm using this same benchmark just to see where it falls in the "market".

(10-21-2014 10:51 PM)patrice Wrote:  Here is an optimized code for the 15C: the algorithm is exactly the same as in the benchmark. It simply use available features more efficiently.

it saves about 35% of runtime. It don't think it is minor.
And that one should go on the list, as long as it represents what a user would do to solve that problem with the machine and as long as it's still the same algorithm. If your optimizations are too obscure (like the bitmap method) they are not representative of normal use, but using "decrease and skip if not zero" style instructions is OK in my opinion (it's common sense).

Claudio
Find all posts by this user
Quote this message in a reply
10-22-2014, 09:15 PM
Post: #18
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-22-2014 01:10 PM)Claudio L. Wrote:  
(10-21-2014 10:51 PM)patrice Wrote:  Take the HP-15C: The single indirection swap in the inner loop imply a mostly 33% overhead. For you, I don't know, but for me, it is NOT MINOR.
30% seems a lot for anybody. When I watched your presentation, I understood that 30% was also including all your other optimizations. But each model will have a different impact. I'm with you, if the difference is 30%, it's not minor by any means.
The inner loop is 20 steps, when optimized, it is only 15 steps.
15/20= 0.75= 1 -25% : it saves 25% in the inner loop
20/15=1.33= 1 +33% : this is 33% overhead for the inner loop

(10-22-2014 01:10 PM)Claudio L. Wrote:  But each model will have a different impact. I'm with you, if the difference is 30%, it's not minor by any means. But on other models I don't think that it will be a big impact. If I remember correctly for the C/Regvars, the change in speed was less than 10%, and that included disabling all compiler optimizations, not just that indirection.
All RPN models are using the same kind of technology and the impact will be essentially the same. What are you comparing the C/Regvars with ?

(10-22-2014 01:10 PM)Claudio L. Wrote:  
(10-21-2014 10:51 PM)patrice Wrote:  It just append that a benchmark IS a competition about being the fastest
What I meant is the goals are different. The objective of the benchmark is not to compete, but to measure and compare. People try to compete (naturally), but that's not what the benchmark is about.
On the benchmark you try to eliminate the user factor (to compare machines). On the competition the user factor is all that matters (it's who does it faster).
What can be the meaning of a comparison when one of the compared thing can be 40% faster? I think the only thing it show is that both can solve the problem.

I think that it is by showing people efficient programing technics that you will teach to use them
If you have the knowledge that lists are slow on RPL, you are free to find another way to solve the problem.

Patrice
“Everything should be made as simple as possible, but no simpler.” Albert Einstein
Find all posts by this user
Quote this message in a reply
10-22-2014, 10:36 PM
Post: #19
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-22-2014 09:15 PM)patrice Wrote:  All RPN models are using the same kind of technology and the impact will be essentially the same. What are you comparing the C/Regvars with ?
With the "structured" C version that's on the benchmark, with no compiler optimizations.

(10-22-2014 09:15 PM)patrice Wrote:  What can be the meaning of a comparison when one of the compared thing can be 40% faster? I think the only thing it show is that both can solve the problem.

I think it still does offer a decent comparison, but as I said previously, since your version uses the same algorithm it should be on the list. I wouldn't put the bitmap solution on the list, but your optimizations seem reasonable and are applicable to other problems.


(10-22-2014 09:15 PM)patrice Wrote:  I think that it is by showing people efficient programing technics that you will teach to use them
If you have the knowledge that lists are slow on RPL, you are free to find another way to solve the problem.
Difference is that I don't think it's a competition, so RPL with lists is perfectly fine with me, and I think it represents how people use RPL. If somebody else wants to optimize it to use the stack, that's fine too.
My observation was not criticism, it went to show that RPN machines are not being unfairly treated, as other models are also in the same position. I'm sure a careful review will reveal that many other platforms/languages could also get a fair amount of optimization, but that doesn't necessarily invalidate the benchmark.

And please notice I have nothing to do with Xerxes, or this benchmark. I never submitted an entry (although somebody else submitted results using my code for the C/Regvars and ARM assembly for the 50g, so you could say I'm the author of those two). I just think its fine the way it is. Perfect? No, but still very useful and informative.

PS: On second thought, if I accept that you are right and this is a competition, would you say I "won"? Should I get at least a T shirt or something for authoring the fastest times on the list, undefeated since 2008? Just joking, of course... (L size, please)
Find all posts by this user
Quote this message in a reply
10-23-2014, 12:29 PM
Post: #20
RE: WP-34S N-Queens Benchmark Based on Patrice Torchet's BITMAP Solution
(10-21-2014 10:51 PM)patrice Wrote:  Here is an optimized code for the 15C: the algorithm is exactly the same as in the benchmark. It simply use available features more efficiently.
Code:
42,21,11    LBL A
   42 34    CLEAR REG
       8    8
   44 .0    STO .0        
   44  1    STO 1        
42, 6,.1    ISG .1        
   43  7    DEG
   44  8    STO 8
42, 6, 0    ISG 0        
   
42,21, 1    LBL 1        
   45  0    RCL 0
   44 25    STO I
           STO 9
           ISG 9
           DEG
42, 6,.1    ISG .1        

42,21, 2    LBL 2        
   45  8    RCL 8
45,30,24    RCL - (i)
   43 20    x=0?
   22  3    GTO 3        
   43 16    ABS
   45  9    RCL 9
45,30,25    RCL - I
43,30, 5    TEST 5        
   22  3    GTO 3        

42, 5,25    DSE I        
   22  2    GTO 2
42, 6, 0    ISG 0        
   43  7    DEG
   45  0    RCL 0        
   44 25    STO I
   45 .0    RCL .0
43,30, 5    TEST 5        
   22  4    GTO 4        
   
42, 4,24    X<> 8        
        STO (I)
   22  0    GTO 1

42,21, 3    LBL 3        
42, 5, 8    DSE 8        
   22  1    GTO 1
   45  0    RCL 0
   44 25    STO I
   45 24    RCL (I)
   44  8    STO 8
42, 5, 0    DSE 0        
   22  3    GTO 3
42, 5, 1    DSE 1        
42, 6, 0    ISG 0
   43  7    DEG
42, 6,.1    ISG .1        
   43  7    DEG
   22  0    GTO 1
   
42,21, 4    LBL 4        
   45 .0    RCL .0
   44 25    STO I
   43 35    CLX

42,21, 6    LBL 6        
       1    1
       0    0
      20    *
45,40,24    RCL + (I)
42, 5,25    DSE I
   22  6    GTO 6
   45 .1    RCL .1
   43 32    RTN

registres
0    line poiunter
1-8    queens
.0    Size
.1    counter

Hi Patrice, IMHO this version fulfils the requirements of being fair and accurate enough in sense of comparing the same algorithm,
whats the goal of the n-queens benchmark. Thank you for your effort and showing whats possible.

Calculator Benchmark
Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 1 Guest(s)