Post Reply 
[34S] Time diff. STO.00 vs. STO 00 ?
03-12-2014, 12:55 AM
Post: #1
[34S] Time diff. STO.00 vs. STO 00 ?
I'm running out of 8-stack so I have begun using local variables. Is there any theoretical difference in execution time? STO/RCL.

Sikuq
Find all posts by this user
Quote this message in a reply
03-12-2014, 01:47 AM
Post: #2
RE: [34S] Time diff. STO.00 vs. STO 00 ?
I would say not noticeably.
The code paths are fractionally different but not enough to really matter. Integer operations and pointer shuffling on the device are insanely fast. Setting up to execute a command takes much longer.


- Pauli
Find all posts by this user
Quote this message in a reply
03-12-2014, 01:49 AM
Post: #3
RE: [34S] Time diff. STO.00 vs. STO 00 ?
Then you could always write a small program to test the timing of say 100,000 STO operations in a loop. Just don't put the LocR command in the loop Smile

- Pauli
Find all posts by this user
Quote this message in a reply
03-12-2014, 02:25 AM
Post: #4
RE: [34S] Time diff. STO.00 vs. STO 00 ?
Pauli,

Thanks again for responding.

I was indeed thinking of a quick loop calculation obviously, but I was more interested in the theoretical conceptualization upon which you just wrote - thanks. Yes, the LocR gave me some surprises as I tried to de-allocate on-the-fly via LocR 00.

What surprised me was the persistence of the locals even after I exited the program but did not yet detach the pointer. I was wondering if the internal local assignment perhaps was stored in a faster but volatile memory bank (i.e. stack) than the global registers. Unfortunately I don't know if the 34s hardware has such internal sophistications. That's why I try to on-stack my interim data as I assume it is a lot faster to do stack rolls than say STO xx and then RCL xx.

While here, did you touch upon the internal housekeeping of the 34s in respect to (de)fragmentation of files? Do the files individually have to be completely contiguous?

-Sikuq
Find all posts by this user
Quote this message in a reply
03-12-2014, 02:46 AM
Post: #5
RE: [34S] Time diff. STO.00 vs. STO 00 ?
(03-12-2014 02:25 AM)Sikuq Wrote:  What surprised me was the persistence of the locals even after I exited the program but did not yet detach the pointer. I was wondering if the internal local assignment perhaps was stored in a faster but volatile memory bank (i.e. stack) than the global registers. Unfortunately I don't know if the 34s hardware has such internal sophistications. That's why I try to on-stack my interim data as I assume it is a lot faster to do stack rolls than say STO xx and then RCL xx.

Locals persist until the calling frame they were allocated in returns. At the very top level, a return is impossible but I think RTN still deallocates them as do a few other things (Marcus's area not mine).

Locals are kept in non-volatile memory between the registers and the program space. Each local register is the equivalent of four program steps. Plus some housekeeping overhead.


Quote:While here, did you touch upon the internal housekeeping of the 34s in respect to (de)fragmentation of files? Do the files individually have to be completely contiguous?

There is no fragmentation in the 34S. Everything is kept compacted at all times. Fortunately, the CPU is very good at moving memory around and there really isn't much memory to move.


- Pauli
Find all posts by this user
Quote this message in a reply
03-12-2014, 03:10 AM
Post: #6
RE: [34S] Time diff. STO.00 vs. STO 00 ?
Pauli,

Thanks again for your very interesting responses. Yep, don't hit the R/S. These are indeed fascinating insights and new to me.

- Sikuq
Find all posts by this user
Quote this message in a reply
03-12-2014, 06:12 AM
Post: #7
RE: [34S] Time diff. STO.00 vs. STO 00 ?
Sikuq,

I'd recommend a little reading in Appendix B.

d:-)
Find all posts by this user
Quote this message in a reply
03-12-2014, 01:50 PM
Post: #8
RE: [34S] Time diff. STO.00 vs. STO 00 ?
Hi Walter,

Why is it you always try to make me feel like a little school kid who didn't do his homework and is trying to hide behind the one in front of you? Alas, I did indeed read App. B but it did not give me the answers I wanted! App. B, should the author have been awake at the helm, should have read: "The 34s memory is one linear chipset where all memory is identical in clock speed."

Actually I learn a lot from you guys and I really appreciate this forum. But, Sir - Walter, where exactly does App. B discuss processing speeds of global, local, stack, FM, etc.? Nowhere ... and you will say that it is all the same so there was no need to mention that. But how am I supposed to know that? That's why I was reading App. B in the first place.

- Sikuq
Find all posts by this user
Quote this message in a reply
03-12-2014, 02:39 PM
Post: #9
RE: [34S] Time diff. STO.00 vs. STO 00 ?
Hey, hey, Sir, I was just recommending App. B since it's clearly stated there where the local registers live. But if you feel you've not done your homework there may be a reason, doesn't it? Wink

BTW, may I ask you to use the button <Quote> instead of <New reply> - it makes it so much easier to detect to which post you are responding. TIA

d:-)
Find all posts by this user
Quote this message in a reply
03-12-2014, 02:47 PM
Post: #10
RE: [34S] Time diff. STO.00 vs. STO 00 ?
(03-12-2014 02:39 PM)walter b Wrote:  Hey, hey, Sir, I was just recommending App. B since it's clearly stated there where the local registers live. But if you feel you've not done your homework there may be a reason, doesn't it? Wink

BTW, may I ask you to use the button <Quote> instead of <New reply> - it makes it so much easier to detect to which post you are responding. TIA

d:-)

Walter,

But of course. Actually, Walter, thanks for the Quote Button suggestion. I'm still new to the this format. One thing I can't figure out is how to order the posts with the newest on top. As is I have to scroll down several pages to get to the latest stuff. Any navigational suggestions?

- Sikuq
Find all posts by this user
Quote this message in a reply
03-12-2014, 03:10 PM
Post: #11
RE: [34S] Time diff. STO.00 vs. STO 00 ?
(03-12-2014 02:47 PM)Sikuq Wrote:  One thing I can't figure out is how to order the posts with the newest on top. As is I have to scroll down several pages to get to the latest stuff. Any navigational suggestions?

For me, threaded mode works well (though not optimum): I get the current message on top and the thread abbreviated below. Just try it. Clicking on the green arrow beams you to the first message you didn't read yet.

It's not optimum since there is no automatic navigation to the next/previous message in that thread. That's a design error IMHO (not Dave's fault). I mentioned that weeks ago and was told I should go to MyBB in that matter which I didn't do so far.

d:-/
Find all posts by this user
Quote this message in a reply
03-12-2014, 04:19 PM
Post: #12
RE: [34S] Time diff. STO.00 vs. STO 00 ?
Walter & Franz,

Thanks for your navigational suggestions. I have tried those but as you both point out each immediately leads to just another inefficiency of some kind. I also wish the QUOTE button had been labeled ANSW or REPL. Now, I ain't bitching none, but I do wonder just why the designers didn't simply order the entries the way the three of us (and thus most likely a lot more) would have preferred it - at least as an option.

- Sikuq
Find all posts by this user
Quote this message in a reply
03-12-2014, 04:26 PM
Post: #13
RE: [34S] Time diff. STO.00 vs. STO 00 ?
(03-12-2014 04:19 PM)Sikuq Wrote:  I also wish the QUOTE button had been labeled ANSW or REPL. Now, I ain't bitching none, but I do wonder just why the designers didn't simply order the entries the way the three of us (and thus most likely a lot more) would have preferred it - at least as an option.

Then everybody would have found the right button, this way you didn't find it again ... Wink

Seriously, this discussion now belongs to the test forum IMHO.

d:-)
Find all posts by this user
Quote this message in a reply
03-12-2014, 08:33 PM
Post: #14
RE: [34S] Time diff. STO.00 vs. STO 00 ?
(03-12-2014 04:26 PM)walter b Wrote:  
(03-12-2014 04:19 PM)Sikuq Wrote:  I also wish the QUOTE button had been labeled ANSW or REPL. Now, I ain't bitching none, but I do wonder just why the designers didn't simply order the entries the way the three of us (and thus most likely a lot more) would have preferred it - at least as an option.

Then everybody would have found the right button, this way you didn't find it again ... Wink

Seriously, this discussion now belongs to the test forum IMHO.

d:-)


Walter & Franz,

I actually did not use the Quote button intentionally this time as I was attempting to answer both you and Franz at the same time thus discovering another missing add-on feature. Test forum? Maybe some other time.

I am sorry Franz if you feel I over-quoted you - that was unintentional. I do very much appreciate and learn from the high quality and detailed feedback from both of you - as always.

- Sikuq
Find all posts by this user
Quote this message in a reply
03-12-2014, 10:14 PM (This post was last modified: 03-13-2014 06:16 PM by Marcus von Cube.)
Post: #15
RE: [34S] Time diff. STO.00 vs. STO 00 ?
Let's talk about facts again. Wink

The Atmel processor is a very limited device. It's fast (at high clock speeds) but it's lacking memory to an extend which made Pauli and me geniuses in saving flash and RAM space. This has several impacts:

We cannot use the processor to its full extent because flash is so limited that we had to use the so called thumb mode which features a reduced instruction set with a smaller memory footprint at the cost of slower execution (limited instruction and processor register set). The code is optimized for size, not for speed. There are some crude manual optimizations just to squeeze out as much user flash for library functions as possible.

The limited RAM forces us to pack everything as tight as possible which slows down access (e. g. bit fields for flags).

OTH, limited means simple: no dynamic memory (and thus no memory leaks), all sizes are known, memory is moved and not linked via pointers, ...

To answer your questions about different memory types in the device. There are three:

1. Non volatile RAM, 2KB. This is user memory, holding all the registers, flags, and the program steps which can be edited directly on the device.

2. Volatile RAM, 4 KB. This can only be used temporarily for internal operations.The contents is lost whenever the processor goes idle, that is when user input is being awaited. We use it as the internal processor stack, and for most internal calculations.

3. Flash memory, 128 KB. Most of it contains the firmware. Some KB are available to the user to backup the non volatile RAM (2KB) and for keystroke programs known as the library. These cannot be directly edited but copied to and from non volatile RAM or compiled externally and sent to the calculator as part of the firmware (calc_full.bin). This memory area is slightly slower than the other two but it doesn't really matter for the keystroke programs stored in flash. The impact on firmware execution speed is of more importance.

To add some info about what Pauli wrote about access to local and global registers:

Global registers move around in memory, depending on the number of registers allocated (REGS command), and whether double precision mode is in effect or not. Thus, finding a specific register needs a few instructions (shifts and adds). Local registers a similar in nature: Their size is variable (double precision or not) and their location is dynamic: they live on the subroutine return stack, the same that manages XEQ and RTN. The effort to find such a register in memory is comparable to that of finding a global register, just a few adds and shifts. The distinction between local and global is done through the internal index: 0 to 111 is global, 112 and higher is local and requires a different access path. To be precise, the range 100 to 111 for the lettered registers requires its own scheme but it deviates only minimally from that for 0 to 99. The effects on execution time should be negligible.

When local registers are allocated, RTN is a bit more involved because it not only needs to pull the return address off the subroutine return stack but also has to skip the local register frame and needs to setup some internal pointers to reactivate the local registers of the caller. Repeating LocR commands on the same stack frame (adjusting the available number of registers) has to move some memory around which may affect performance. The latter should be a rare case anyway.

Just enjoy the features! That's why we have implemented them.

Edited for spelling and grammar.

Marcus von Cube
Wehrheim, Germany
http://www.mvcsys.de
http://wp34s.sf.net
http://mvcsys.de/doc/basic-compare.html
Find all posts by this user
Quote this message in a reply
03-12-2014, 11:17 PM (This post was last modified: 03-13-2014 02:09 AM by Sikuq.)
Post: #16
RE: [34S] Time diff. STO.00 vs. STO 00 ?
(03-12-2014 10:14 PM)Marcus von Cube Wrote:  Let's talk about facts again. Wink

The Atmel processor is a very limited device. It's fast (at high clock speeds) but it's lacking memory to an extend which made Pauli and me geniuses in saving flash and RAM space. This has several impacts:

We cannot use the processor to its full extent because flash is so limited that we had to use the so called thumb mode which features a reduced instruction set with a smaller memory footprint at the cost of slower execution (limited instruction and processor register set). The code is optimized for size, not for speed. There are some crude manual optimizations just to squeeze out as much user flash for library functions as possible.

The limited RAM forces us to pack everything as tight as possible which slows down access (e. g. bit fields for flags).

OTH, limited means simple: no dynamic memory (and thus no memory leaks), all sizes are known, memory is moved and not linked via pointers, ...

To answer your questions about different memory types in the device. There are three:

1. Non volatile RAM, 2KB. This is user memory, holding all the registers, flags, and the program steps which can be edited directly on the device.

2. Volatile RAM, 4 KB. This can only be used temporarily for internal operations.The contents is lost whenever the processor goes idle, that is when user input is being awaited. We use it as the internal processor stack, and for most internal calculations.

3. Flash memory, 128 KB. Most of it contains the firmware. Some KB are available to the user to backup of the non volatile RAM (2KB) and for keystroke programs known as the library. These cannot be directly edited but copied to and from non volatile RAM or compiled externally and sent to the calculator as part of the firmware (calc_full.bin). This memory area is slightly slower than the other two but it doesn't really matter for the keystroke programs stored in flash. The impact on firmware execution speed is of more importance.

To add some info about what Pauli wrote about access to local and global registers:

Global registers move around in memory, depending on the number of registers allocated (REGS command), and whether double precision mode is in effect or not. Thus, finding a specific register needs a few instructions (shifts and adds). Local registers a similar in nature: There size is variable (double precision or not) and their location is dynamic: they live on the subroutine return stack, the same that manages XEQ and RTN. The effort to find such a register in memory is comparable to that of finding a global register, just a few adds and shifts. The distinction between local and global is done through the internal index: 0 to 111 is global, 112 and higher is local and requires a different access path. To be precise, the range 100 to 111 for the lettered registers requires its own scheme but it deviates only minimally from that for 0 to 99. The effects on execution time should be negligible.

When local registers are allocated, RTN is a bit more involved because it not only needs to pull the return address off the subroutine return stack but also has to skip the local register frame and needs to setup some internal pointers to reactivate the local registers of the caller. Repeating LocR commands on the same stack frame (adjusting the available number of registers) has to move some memory around which may affect performance. The latter should be a rare case anyway.

Just enjoy the features. That's why we have implemented them.

Marcus,

I hugely appreciate this outstanding explanation. It answers most of my questions. I am impressed that you took the time to write this lengthy insight on the forum. I hope many other 34S users can read it as well.

I am a daily full-time end-user of "your" WP34S and I could not easily perform my daily chores without it. My interest in the technical details is simply in order to make the fastest and most efficient program that I can. I do indeed enjoy all of the features. Obviously, Walter's manual makes it all possible for me as well.

Thanks again to all of you involved in the 34S,

- Sikuq

PS: All those mathematical functions in the 34S libraries, just how do they work?
Find all posts by this user
Quote this message in a reply
03-13-2014, 06:26 PM
Post: #17
RE: [34S] Time diff. STO.00 vs. STO 00 ?
(03-12-2014 11:17 PM)Sikuq Wrote:  I hugely appreciate this outstanding explanation. It answers most of my questions. I am impressed that you took the time to write this lengthy insight on the forum. I hope many other 34S users can read it as well.

Thanks for the kind words. (Please notice that there is no need to quote a large posting completely.)

(03-12-2014 11:17 PM)Sikuq Wrote:  PS: All those mathematical functions in the 34S libraries, just how do they work?

The flash library is an assortment of keystroke programs. Their commented source code can be found in the library folder on SourceForge.

All the built-in stuff can be divided in two groups:

1. Native code (compiled C) is used for the infrastructure code (which can be seen as the operating system) and the basic math stuff. Some more involved mathematical functions are coded in C for performance reasons.

2. XROM which is nothing more than keystroke programs. Most XROM code has a special environment (register space independent from user memory, double precision as default, proper handling of LastX, lack of labels in exchange for absolute jumps, etc.). If you want to learn more read the XROM sources!

Marcus von Cube
Wehrheim, Germany
http://www.mvcsys.de
http://wp34s.sf.net
http://mvcsys.de/doc/basic-compare.html
Find all posts by this user
Quote this message in a reply
03-13-2014, 09:20 PM (This post was last modified: 03-13-2014 09:22 PM by Sikuq.)
Post: #18
RE: [34S] Time diff. STO.00 vs. STO 00 ?
PS: All those mathematical functions in the 34S libraries, just how do they work?

Quote:The flash library is an assortment of keystroke programs. Their commented source code can be found in the library folder on SourceForge.

All the built-in stuff can be divided in two groups:

1. Native code ...

Marcus,

I appreciate your response. I will research the terms. However I always wondered just how the 34S actually calculates say a square root or the Gamma function. Does it use a power series, other basic functions, or say Fourier? And did you get those functions from somewhere else and just compiled them in there?

Quote:(Please notice that there is no need to quote a large posting completely.)

Thanks for the heads up on the quotes as well. It certainly compelled me to learn new details about the effects of moving quote brackets around. Wink


Much obliged.

- Sikuq
Find all posts by this user
Quote this message in a reply
03-13-2014, 09:53 PM
Post: #19
RE: [34S] Time diff. STO.00 vs. STO 00 ?
(03-13-2014 09:20 PM)Sikuq Wrote:  However I always wondered just how the 34S actually calculates say a square root or the Gamma function. Does it use a power series, other basic functions, or say Fourier? And did you get those functions from somewhere else and just compiled them in there?

Square root, the four arithmetic operations, natural logarithm and exponential are included in the decNumber library we used. I replaced the natural logarithm (twice) because the supplied version was unacceptably slow, I'm still not entirely happy with its performance. For reference, square root uses Newton's method after an initial guess -- this is a fairly standard technique.

Pretty much everything else I wrote after finding suitable standard algorithms or in a couple of cases doing my own expansions. As an example of the former, the gamma function uses a rapidly converging series -- unfortunately I don't remember the source I used, but I will have the original paper stored away somewhere. I ended up purchasing a not-insignificant number of books about numeric analysis and transcendental function implementation and spent lots of time searching the Internet for suitable algorithms.

If you look in doc/formulas there are some erratic notes about what I used where and doc/distribution-formulas contains a longish exchange between Dieter and others giving incremental improvements to the statistical distributions. These documents were primarily for my use and aren't well organised but I figured some record was better than none.

Finally, a few of the functions have been examined in detail by members of this forum and sometimes, changes and improvements have been added. For example, Dieter's work on getting the Lambert W function accurate.


- Pauli
Find all posts by this user
Quote this message in a reply
03-13-2014, 09:54 PM
Post: #20
RE: [34S] Time diff. STO.00 vs. STO 00 ?
In fact, you don't have to quote any text at all: Just click on <Quote>, delete all the quotation, and your response will still be positioned correctly in the thread. No secret, but seems you didn't find out yet.

Back to topic: There are various numeric approximations to all kinds of mathematical functions. The older manuals of HP calculators usually contain some nice examples (see the museum DVD). Good ol' Abramovitz & Stegun contains also formulas applicable, for example. Just google a bit or (horribile dictu!) visit the library of the university next to you.

d:-)
Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 1 Guest(s)