Threaded Mode | Linear Mode

grsbanks · 01-04-2018, 08:38 AM

(01-04-2018 08:06 AM)Dieter Wrote: LBL C is not required, the 35s can branch to any program line. So remove LBL C and GTO B007 instead. This way the loop also is one step shorter.

And a lot faster in execution time because the 35s doesn't have to search through program memory for the LBL at each GTO (it isn't cached in the 35s).

pier4r · (This post was last modified: 01-04-2018 08:53 AM by pier4r.)

(01-04-2018 07:49 AM)Dieter Wrote: BTW, the real 35s nerd of course replaces the 3 with pi IP which consumes less memory and probably is even faster. ;-)

Dieter

What the..... is it faster to do an operation, "pi IP" rather than putting a constant (3) in a program? Furthermore having pi as a constant, like 3.

Now I am interested. Why?

TheKaneB · 01-04-2018, 10:00 AM

(01-04-2018 08:38 AM)grsbanks Wrote:
(01-04-2018 08:06 AM)Dieter Wrote: LBL C is not required, the 35s can branch to any program line. So remove LBL C and GTO B007 instead. This way the loop also is one step shorter.

And a lot faster in execution time because the 35s doesn't have to search through program memory for the LBL at each GTO (it isn't cached in the 35s).

since I only have 2 labels in memory I think the gain would be negligible with only 1000 cycles though. Of course I should do an experiment to verify my statement Smile

same thing with pi IP, I can’t see why it should be any faster, does the 35s have a separate data type for integers, or is it just the removing of a RCL enough to get a significant speed up?

BartDB · (This post was last modified: 01-04-2018 05:32 PM by BartDB.)

(01-04-2018 08:53 AM)pier4r Wrote:
(01-04-2018 07:49 AM)Dieter Wrote: BTW, the real 35s nerd of course replaces the 3 with pi IP which consumes less memory and probably is even faster. ;-)

Dieter
What the..... is it faster to do an operation, "pi IP" rather than putting a constant (3) in a program? Furthermore having pi as a constant, like 3.

Now I am interested. Why?

I can confirm that using "pi IP" uses less memory and is faster.

Looking at the decrease of total memory available (because program length LN=xxx is erroneous), a program line with a constant consumes 38 bytes and a program line with a reserved word uses 3 bytes.

From https://en.m.wikipedia.org/wiki/HP_35s :
"Since complex numbers and vectors of up to three elements can be stored as a single value, each data variable occupies 37 bytes, enough for a type indicator and three floating-point numbers."
It seems the same goes for a constant in a program line.
(Thus 37 bytes for the constant + 1 for the line itself, from which we could deduce a reserved word = 2 bytes).

Test programs created: (again length quoted looks at the decrease of total memory available)

Code:

A001 LBL A

A002 1000.000

A003 STO Z

A004 CLx

A005 3

A006 +

A007 DSE Z

A008 GTO A005

A009 RTN

Length = 97 bytes

A001 LBL A

A002 1000.000

A003 STO Z

A004 CLx

A005 pi

A006 IP

A007 +

A008 DSE Z

A009 GTO A005

A010 RTN

Length = 65 bytes

Program with constant 3 executes in about 1m 14s and the "pi IP" in about 1m 04s
Thus for this case a 32 byte and 10s saving.

EDIT: Note that each program was entered on its own in a cleared calculator in an attempt to get accurate memory use.

.

pier4r · 01-04-2018, 01:21 PM

(01-04-2018 12:25 PM)BartDB Wrote: Program with constant 3 executes in about 1m 14s and the "pi IP" in about 1m 04s
Thus for this case a 32 byte and 10s saving.

Thanks for the test and the result still baffles me.

Ok for the memory usage, I don't mind it. Anyway the execution is something that I cannot yet understand.

With '3' The CPU has to load the data from memory, and then use it.
With 'pi IP' the cpu has to load the data from memory, load the function, then execute it, the use the result.

**rprosperi** · 01-04-2018, 02:16 PM

(01-04-2018 07:49 AM)Dieter Wrote: Perhaps I should have suggested a different variable name than "T" (as in "three") to avoid confusion: The stack's T-register is not affected here, every "T" refers to a variable T that holds the constant "3" which has been initially stored there.

Thanks for clarifying Dieter. I struggled with this, since your initial advice was to use a variable, but it was the use of the word register (vs. variable) that convinced me; I thought the concept in your advice was changed to using the stack.

TheKaneB · (This post was last modified: 01-04-2018 02:38 PM by TheKaneB.)

@BartDB: thanks for the explanation, very helpful!
I am still baffled by the execution time difference, I have the same concerns that pier4r expressed earlier.

My speculation is that the symbol "pi" is a pointer to the actual constant, so only 2 bytes are shuffled around the stack instead of 37, thus saving a lot of time in load/store operations, that and maybe the IP command is using some internal data type trick to clear out the decimal part in a few cycles.

Dieter · (This post was last modified: 01-05-2018 11:31 AM by Dieter.)

(01-04-2018 01:21 PM)pier4r Wrote: Thanks for the test and the result still baffles me.

Ok for the memory usage, I don't mind it. Anyway the execution is something that I cannot yet understand.

Then take a look at the results below. ;-)

First of all, I think I was wrong about the memory usage of numeric constants. My knowledge essentially was what BartDB said: any numeric constant takes 37 bytes. But I think this is wrong and the 37 bytes refer to the memory used by the data registers, including the 800 indirect ones on the 35s. For numbers in programs it looks like it's actually 3 bytes plus one more byte for each digit / decimal point / E / sign. So a simple "3" requires four bytes. Regular 33s/35s commands occupy 3 bytes, so "pi IP" is 6 bytes, i.e. two bytes more than a plain 3.

But indeed the execution speed significantly varies with the way constants are used in 35s programs. Here is an example with 100 loops of adding 3√3+3 where the constant "3" is coded in different ways:

Code:

B001 LBL B

B002 100

B003 STO C

B004 CLSTK

B005 3

B006 SQRT

B007 3

B008 x

B009 3

B010 +

B011 +

B012 DSE C

B013 GTO B005

B014 RTN

This straightforward, plain vanilla code runs in about 17,5 s.

Next version:

Code:

B001 LBL B

B002 100

B003 STO C

B004 CLSTK

B005 pi

B006 IP

B007 SQRT

B008 pi

B009 IP

B010 x

B011 pi

B012 IP

B013 +

B014 +

B015 DSE C

B016 GTO B005

B017 RTN

Replacing the number "3" with pi IP speeds up the program:
The above code runs in about 14 s.

Here's another version, this time using LastX:

Code:

B001 LBL B

B002 100

B003 STO C

B004 CLSTK

B005 3

B006 SQRT

B007 LastX

B008 x

B009 LastX

B010 +

B011 +

B012 DSE C

B013 GTO B005

B014 RTN

Avoiding two out of three numeric constants gives a significant boost compared to the first version:
The above code runs in only 13 s, comparable to the previous pi IP version, but requiring two steps less per loop which may account for the slight difference.

And yet another version:

Code:

B001 LBL B

B002 100

B003 STO C

B004 3

B005 STO 7

B006 CLSTK

B007 RCL T

B008 SQRT

B009 RCL T

B010 x

B011 RCL T

B012 +

B013 +

B014 DSE C

B015 GTO B007

B016 RTN

Using RCL and avoiding inline numbers completely speeds up the program even more.
The above code requires just about 10 s.

And finally this one:

Code:

B001 LBL B

B002 100

B003 STO C

B004 3

B005 STO T

B006 CLSTK

B007 RCL T

B008 SQRT

B009 RCLxT

B010 RCL+T

B011 +

B012 DSE C

B013 GTO B007

B014 RTN

Using RCL-Arithmetics saves two lines and squeezes out one more second.
The above code now runs in about 9 s.

So it looks like you can almost double the execution speed of this program by choosing the right method that best fits the 35s.

Finally, here is another example with three different numeric constants:

Code:

B001 LBL B

B002 100

B003 STO C

B004 CLSTK

B005 3

B006 SQRT

B007 2

B008 x

B009 1

B010 +

B011 +

B012 DSE C

B013 GTO B005

B014 RTN

The standard version runs in about 17,5 s just as the first program. This could be expected as only the values of the three constants are different.

Now try replacing 3, 2 and 1 with other commands:

Code:

B001 LBL B

B002 100

B003 STO C

B004 CLSTK

B005 pi

B006 IP

B007 SQRT

B008 e

B009 IP

B010 x

B011 pi

B012 SGN

B013 +

B014 +

B015 DSE C

B016 GTO B005

B017 RTN

Although this version has more steps per loop, avoiding the three numeric constants and replacing them with IP(pi), IP(e) and sign(pi) yields a speedup which makes the program finish in merely 13,5 seconds. Finally an "ENTER +" instead of the multiplication saves another second so that we get below 13 s.

Now, why are regular inline numbers so slow? I suspect this is because they are handled as equations. This would also match their memory usage (3+n bytes). Equations have to be parsed each time the program comes across one, and this requires some time.

(01-04-2018 08:38 AM)grsbanks Wrote:
(01-04-2018 08:06 AM)Dieter Wrote: LBL C is not required, the 35s can branch to any program line. So remove LBL C and GTO B007 instead. This way the loop also is one step shorter.

And a lot faster in execution time because the 35s doesn't have to search through program memory for the LBL at each GTO (it isn't cached in the 35s).

I don't think that removing the second label from Antonio's code yields any significant speedup (except for one step less per loop) as the 35s does not search labels the way classic HPs did: there is no "GTO C", i.e. "search for label C and continue there". The code is "GTO C001", i.e. "branch to line 001 of program C". So I don't think there is a speed difference between "GTO C001" and "GTO B007". In both cases it's (more or less) direct line addressing.

Dieter

Edit: corrected a few listings, especially line numbers

BartDB · 01-04-2018, 07:31 PM

(01-04-2018 07:00 PM)Dieter Wrote: First of all, I think I was wrong about the memory usage of numeric constants. My knowledge essentially was what BartDB said: any numeric constant takes 37 bytes. But I think this is wrong and the 37 bytes refer to the memory used by the data registers, including the 800 indirect ones on the 35s. For numbers in programs it looks like it's actually 3 bytes plus one more byte for each digit / decimal point / E / sign. So a simple "3" requires four bytes. Regular 33s/35s commands occupy 3 bytes, so "pi IP" is 6 bytes, i.e. two bytes more than a plain 3.

In actual fact it is difficult to determine the memory usage of anything on this calculator.

With a cleared calculator:
MEM shows 30,192 bytes available
add LBL A
MEM shows 30,189 bytes available, i.e. a line with a LBL uses 3 bytes
add A002 3
MEM shows 30,151 bytes available, i.e. adding line with a constant takes 38 bytes

But with a calculator with 2 small programs and a few equations in it:
MEM shows 29,482 bytes available
add LBL X (because A is already in use)
MEM shows 29,482 bytes available, so now a line with a LBL uses 0 bytes?
add X002 3
MEM shows 29,447 bytes available, so now adding line with a constant takes 35 bytes?

Confusing.

The only thing we can be sure about is the speed advantage, as this is measured with a device that's not part of the calculator.

Dieter · 01-04-2018, 08:33 PM

(01-04-2018 07:31 PM)BartDB Wrote: In actual fact it is difficult to determine the memory usage of anything on this calculator.

Yes, indeed.

(01-04-2018 07:31 PM)BartDB Wrote: With a cleared calculator:
MEM shows 30,192 bytes available
add LBL A
MEM shows 30,189 bytes available, i.e. a line with a LBL uses 3 bytes
add A002 3
MEM shows 30,151 bytes available, i.e. adding line with a constant takes 38 bytes
...

Maybe you better check the LN= values for each program. These look more consistent to me. In the MEM menu press "2" (PGM) and select your program/label. You should see that constants with n digits require 3+n bytes.

(01-04-2018 07:31 PM)BartDB Wrote: The only thing we can be sure about is the speed advantage, as this is measured with a device that's not part of the calculator.

;-)

Dieter