Threaded Mode | Linear Mode

pier4r · (This post was last modified: 05-02-2017 08:04 PM by pier4r.)

While reading the general forum, I ended up rereading the first page of the HRAST basic, pointing to the calculator benchmark. (The HRAST basic produces quite neat results, in line with sysrpl versions)

I thought that the benchmark, here, was not maintained since long time. Instead it has many new results! I wonder how the maintainer tracks the new results (Xerses seems registered here, just not so active). Kudos to him.

Said that, I believe that executions that are getting under 1 seconds are increasingly limited by the overhead of just 'starting' the computation. Therefore I would suggest to expand the benchmark for faster devices, for example nqueens with n=9, to see how much the real computation takes on the "fast" device.

For example, in a recent topic that I found here, a user reports a Free42 results that I believe are limited by the need to start the program (same for the HP prime application on similar android devices).

I remember I had this idea already and I checked the benchmarks that I added on the wiki4hp in 2013 (digression1). In particular the middle square benchmark (that is not a benchmark, rather a challenge, more of it later) that I proposed after reading the article of the middle square method used by von Neumann.

Here is the idea:

Code:

k:= a positive integer

n:= 100^k

For i:=(n/10) to (n-1) do {

  old := i

  sequenceIsConvergent := false

  limitCyclicSequences := 0 //because we can end in cyclic sequences and we don't want to use a lot of memory!

  while ( (NOT sequenceIsConvergent) AND (limitCyclicSequences < n) ) {

    new = extract_middle_number from old^2 (see below)

    if new == old { 

      sequenceIsConvergent = true

    }

    else { 

      old := new

      limitCyclicSequences = limitCyclicSequences+1;

    }    

  }//end while

}//end for

->Result: the time spent to finish the computation given the starting k value.

How does extract_middle_digits work?

Given n, that is a power of 10 of the type 10^d, 

its square is equal to 10^(d*2) and it has (d*2+1) digits.

We consider only the last (d*2) digits. 

That is: if n=10000, 10000*10000 = 100000000 we consider only 00000000 without the first 1.

Then we 'consider' all the numbers lower than 10^(d*2) with (d*2) digits. 

For example if d=4 and the number under process is 1542, 

we consider 1542^2 as 02377764 instead of 2377764.

Why d=4 ? Because we use as seed, with n=10000, numbers from 1000 to 9999, so numbers having 4 digits.

After that we pick the d=4 middle digits, from 02377764 we pick 02[3777]64.

So the middle number extracted is 3777.

This type of benchmark is pretty scalable, changing 'k', and should expose also the relative timings between a fast device/implementation vs a slower one as soon as k changes.

Anyway a user pointed out in my previous topic on the old forum that a benchmark is useful as comparison if similar set of instructions are used between calculators. Instead the middle square procedure can take advantage of instructions that some calculators does not have (Like IP and FP). Furthermore one is free to derive its own method to extract the middle digits, not exactly a fixed procedure.

Therefore more than a benchmark it is a challenge, where every calculator (and programming languages for those) is welcomed.
Since it is a challenge, feel free to find the most efficient method to compute the middle squares within the given constraints (k, n, i in the pseudocode above, the rest is optional).

According to the results previously collected, with 100^1 :
- 50g, ARM ASM / SATURN ASM under 1 second - same problem of the 8 queens benchmark
- 50g sysRPL around 1 second
- 50g userRPL around 10 seconds

With 100^2
- 50g, ARM ASM 169 sec
- 50g, saturn ASM 1445 secs - with k=2 one can appreciate the difference between saturn ASM and ARM ASM a bit more.
- 50g userRPL (not optimized) estimated over 250000 seconds

I don't think that calculators that handles the 8 queens benchmark in more than 10 minutes can end the challenge in little time ('little' according to your patience), but maybe others like the 48 series or those HP palmtops can do it.

What about the newRPL? (I can attempt this on the PC version, but that does not count)
Anyone willing to do it with HRAST BASIC?
(any other stable language for the 50g? Hp lua AFAIK is not so stable)
42s?
48 variants?
HP prime?
Casio/Ti ? (I will ask on the respective communities if I collect some more results here)

Is anyone willing to give it a shoot? I will try to implement it on the ti89, nspire and, why not, the free42 just as reference (also to learn a bit the RPN programming, although it looks like assembly)

digression1: there are many more to add, a simple search like site:hpmuseum.org benchmark returns hundreds of results to check, with at least tens of different benchmarks.

PS: The challenge itself could be even translated in a list processing problem now that I think about it, since one can work on the single digits. Although one has to implement operations for digits operations.

toml_12953 · 05-02-2017, 07:15 PM

(05-02-2017 05:18 PM)pier4r Wrote: How does extract_middle_digits work?
Given n, that is a power of 10 of the type 10^d,
its square is equal to 10^(d*2) and it has (d*2+1) digits.

We consider only the last (d*2) digits.
That is: if n=10000, 10000*10000 = 100000000 we consider only 00000000 without the first 1.

Then we 'consider' all the numbers lower than 10^(d*2) with (d*2) digits.
For example if d=4 and the number under process is 1542,
we consider 1542^2 as 02377764 instead of 2377764.

After that we pick the d=4 middle digits, from 02377764 we pick 02[3777]64.
So the middle number extracted is 3777.

Sorry to be dense but I don't get it. Is d = log(n)? If so then log(1542) is 3.188... How do you get d=4?

Tom L

Claudio L. · 05-02-2017, 07:36 PM

(05-02-2017 05:18 PM)pier4r Wrote: What about the newRPL? (I can attempt this on the PC version, but that does not count)

You have an RPL solution already, just copy/paste to an SD card with proper Unicode characters and run it on newRPL. It's about time you give it a shot on the real hardware, USB support will not be ready any time soon.

pier4r · (This post was last modified: 05-02-2017 08:03 PM by pier4r.)

(05-02-2017 07:15 PM)toml_12953 Wrote:
(05-02-2017 05:18 PM)pier4r Wrote: How does extract_middle_digits work?
Given n, that is a power of 10 of the type 10^d,
its square is equal to 10^(d*2) and it has (d*2+1) digits.

We consider only the last (d*2) digits.
That is: if n=10000, 10000*10000 = 100000000 we consider only 00000000 without the first 1.

Then we 'consider' all the numbers lower than 10^(d*2) with (d*2) digits.
For example if d=4 and the number under process is 1542,
we consider 1542^2 as 02377764 instead of 2377764.

After that we pick the d=4 middle digits, from 02377764 we pick 02[3777]64.
So the middle number extracted is 3777.

Sorry to be dense but I don't get it. Is d = log(n)? If so then log(1542) is 3.188... How do you get d=4?

Tom L

Nothing about "to be dense", it is just my explanation not clear.

In the example I'm using 1542, therefore the numbers used as seed are from 1000 to 9999. So we need seeds with 4 digits (or considered as having 4 digits, in the case of leading zeroes).

You can see it also from n. n is 100^2 or 10000 in the example, without considering the leading 1, those are 4 digits.

@Claudio: I gave a shot on the real hw, but moving back and forth from the 2.15 hw (that is more comfortable to use with frequent changes) is not really feasible for me because the alternative is to frequently use the sd card. For that I need a second 50g. I will check ebay until I get one for a low price. Anyway thanks for the info, at least now I know.

Claudio L. · 05-02-2017, 08:02 PM

(05-02-2017 08:00 PM)pier4r Wrote: @Claudio: I gave a shot on the real hw, but moving back and forth from the 2.15 hw (that is more comfortable to use with frequent changes) is not really feasible for me because the alternative is to frequently use the sd card. For that I need a second 50g. I will check ebay until I get one for a low price. Anyway thanks for the info, at least now I know.

No worries, I actually looked at a possible USB library just for you, but there isn't much for this chipset. I did find one for an AVR but porting effort will be quite significant.

pier4r · (This post was last modified: 05-02-2017 08:07 PM by pier4r.)

(05-02-2017 08:02 PM)Claudio L. Wrote: No worries, I actually looked at a possible USB library just for you, but there isn't much for this chipset. I did find one for an AVR but porting effort will be quite significant.

I'm honored, but I would say "first things first", even if the newRPL would be highly useful "only" on the PC would be already great (I assume that the PC version is, like the 50g version, compiled with the right target architecture, so it is quite efficient). So do not bother about my requests. I wait and having an excuse for a 2nd 50g is never bad.

xerxes · (This post was last modified: 05-03-2017 04:15 PM by xerxes.)

(05-02-2017 05:18 PM)pier4r Wrote: Said that, I believe that executions that are getting under 1 seconds are increasingly limited by the overhead of just 'starting' the computation. Therefore I would suggest to expand the benchmark for faster devices, for example nqueens with n=9, to see how much the real computation takes on the "fast" device.

I noticed the issue of accurate timing for very fast results in the beginning of making the benchmark already,
especially when starting with the assembly languages. The solution was to use an outer loop, that allows a much
more accurate timing and making the overhead insignificant. I've used 100000 iterations for the SH-3 assembly
version for example, what should be accurate enough, I guess. I've also tested larger chess boards for verifying
the same speed factor. Thanks for pointing this out.

pier4r · 05-03-2017, 04:31 PM

Yes an idea is to solve the problem multiple times in a scalable way or to enlarge the problem. Actually just repeating the same problem multiple time ensures the most direct way of comparison. Like "this device solved the problem in 20 seconds, this other device solved the problem 1350 times in 20 seconds".

I did not think of it, quite straightforward and effective.

Helix · (This post was last modified: 05-04-2017 03:27 PM by Helix.)

(05-02-2017 05:18 PM)pier4r Wrote: I thought that the benchmark, here, was not maintained since long time. Instead it has many new results! I wonder how the maintainer tracks the new results (Xerses seems registered here, just not so active). Kudos to him.

I agree that Xerxes has made a nice work in maintaining this page, and furthermore his presentation is very clean and simple to interpret at first glance.

(05-02-2017 05:18 PM)pier4r Wrote: Said that, I believe that executions that are getting under 1 seconds are increasingly limited by the overhead of just 'starting' the computation. Therefore I would suggest to expand the benchmark for faster devices, for example nqueens with n=9, to see how much the real computation takes on the "fast" device.

The main limitation of this benchmark in my opinion is the use of integers exclusively, which greatly favors some languages. As soon as there are some calculations with reals, this benchmark is grossly misleading.

For example, if I take the HP 50G with User RPL as a reference, then this benchmark gives the following increases of speed :
HP Prime : 65x
HP 200LX with Turbo Pascal : 213x
HP 50G with newRPL : 219x

Now, if I consider this very simple Calculator Performance Index, which uses calculations on reals, the results are dramatically different :
HP 200LX with Turbo Pascal : 5.5x
HP 50G with newRPL (12 digits) : 17.5x
HP Prime : 48x

And finally, the Savage Benchmark, which relies heavily on transcendental functions, gives the following results:
HP 200LX with Turbo Pascal : 1.5x
HP 50G with newRPL (12 digits) : 4.3x
HP Prime : 114x

I like a lot the "Calculator Performance Index", which is probably rather representative of usual scientific programs. The table of benchmarks is not as complete as the Xerxes Table, but it is instructive.

pier4r · 05-03-2017, 09:46 PM

Nice input. I saw such benchmarks with floats also in the old forum, we can integrate it on the wiki 4 hp and expand it.

xerxes · 05-04-2017, 01:46 PM

(05-03-2017 09:18 PM)Helix Wrote: The main limitation of this benchmark in my opinion is the use of integers exclusively, which greatly favors some languages. As soon as there are some calculations with reals, this benchmark is grossly misleading.

I see nothing misleading here, because it's a question of correct interpretation of the test and the results.
The intention of choosing n-queens was to have an integer benchmark with array access to have a fairly realistic
comparison for this type of programming problems and not testing the floating point funtions, what others did before.
IMHO transcendental functions are not well suited for testing the efficiency of a programming language. It's true, that
n-queens strongly favours languages with integer support, but that can be said for all types of integer only problems.
Your examples show clearly that there cannot be one overall benchmark.

toml_12953 · 05-04-2017, 02:07 PM

(05-04-2017 01:46 PM)xerxes Wrote:
(05-03-2017 09:18 PM)Helix Wrote: The main limitation of this benchmark in my opinion is the use of integers exclusively, which greatly favors some languages. As soon as there are some calculations with reals, this benchmark is grossly misleading.

Your examples show clearly that there cannot be one overall benchmark.

Exactly. In order for a benchmark to be meaningful, it has to reflect the type of work you typically do. For surveyors, pilots and some others transcendental functions would be of great importance. For gamers, integer functions might be more relevant. It's important to find benchmarks that test your own real-world scenarios.

Tom L

Helix · 05-04-2017, 02:37 PM

(05-04-2017 01:46 PM)xerxes Wrote: I see nothing misleading here, because it's a question of correct interpretation of the test and the results.

My wording was perhaps inadequate, but I agree with your explanations. For those interested in floating point operations, the 8 queens benchmark is not well suited. But I'm not aware of many complete calculator benchmarks which use floating point functions.
Perhaps this one : http://www.wiki4hp.com/doku.php?id=benchmarks:savage

HrastProgrammer · (This post was last modified: 05-09-2017 03:41 AM by HrastProgrammer.)

(05-02-2017 05:18 PM)pier4r Wrote: Anyone willing to do it with HRAST BASIC?

This is the appropriate HRAST BASIC program based on the direct conversion of the above pseudocode for K=1:

Code:

WATCH INTEGER K=1,N=100^K,Q=SQR N

FOR I=N/10 TO N-1 O=I FOR L=N R=MOD(^O/Q,N) IF O<>R@ O=R NEXT ELSE LEAVE

NEXT ? TICKS

I don't have a real calculator with me and I could only test it on Emu48 with "Authentic Calculator Speed" enabled. The execution time is around 4.6s.

I will check it on the real HP-50G when I'll be back home.

HrastProgrammer · (This post was last modified: 05-09-2017 03:41 AM by HrastProgrammer.)

Real calculator execution time for the above HRAST BASIC program:

HP-50G ... 1.94s

HP-49G ... 4.81s (the time on HP-48GX should be the same)

pier4r · (This post was last modified: 05-07-2017 08:23 AM by pier4r.)

I add the results.

Added.

Vtile · 05-07-2017, 11:12 PM

There is Pascal compiler floating around for 50g, but it were PC side IIRC. HP-Pascak or similar.

pier4r · 05-08-2017, 06:30 AM

You mean this: http://hppascal.free.fr/pages/home.htm ?

It seems a (pretty nice) project that transform the instruction in assembly for the Saturn CPU, not for ARM. I would believe the HRAST BASIC would do the same. Still impressive.

Massimo Gnerucci · 05-08-2017, 11:44 AM

(05-08-2017 06:30 AM)pier4r Wrote: You mean this: http://hppascal.free.fr/pages/home.htm ?

It seems a (pretty nice) project that transform the instruction in assembly for the Saturn CPU, not for ARM. I would believe the HRAST BASIC would do the same. Still impressive.

Yes, that's it.
Unfortunately development is frozen since so many years...

pier4r · 05-09-2017, 04:03 PM

(05-07-2017 07:45 AM)HrastProgrammer Wrote: Real calculator execution time for the above HRAST BASIC program:

HP-50G ... 1.94s

HP-49G ... 4.81s (the time on HP-48GX should be the same)

Since I cannot send you a pm, remember that the wiki is open Smile

. Everyone can contribute.

Added the new result and code to the page of the challenge.