Threaded Mode | Linear Mode

3298 · (This post was last modified: 10-07-2018 09:17 PM by 3298.)

That's curious indeed.
As some forum regulars might remember (despite my relatively low post frequency) I have some SysRPL experience, and with the essentials for on-board SysRPL and ASM development installed, it was easy for me to peek inside the guts of the commands and understand most of it.
HEAD is, as mentioned in the post you linked, a romptr. That's SysRPL talk for a library command; several standard UserRPL commands are like that. I'll make it short and just say that this incurs a certain performance penalty, but that penalty does not depend on what data is on the stack. A garbage collection can be triggered while resolving the romptr, but I don't think that's the reason for your measurements. You'll see why in just a moment.

Most SysRPL commands do not check if they get correct arguments - there are dedicated SysRPL commands to ensure that, and the typical implementation of a UserRPL command consists of a bunch of these checking commands followed by the commands that actually perform the operation. In HEAD's case, after confirming that at least one object is on the stack, and it is a list, this command sequence is run:

Code:

::

  DUPLENCOMP

  #0=case

  PTR 1933C

  CARCOMP

;

where CARCOMP is the command that actually does the work, and the unnamed program at address 1933C is:

Code:

::

  # 501

  ERROROUT

  (some irrelevant stuff follows; it's only here due to extreme code size optimization and cannot be reached normally due to the never-returning error handler launched by the previous command)

;

Translated into UserRPL commands, HEAD would therefore look approximately like this:

Code:

\<<

  DUP SIZE

  IF 0. ==

  THEN #501h DOERR

  END

  (actually retrieve the first element of the list here)

\>>

The crucial bit is the error checking code for empty lists. It counts the elements in the list and checks if that number is 0. In RPL a list is just a sequence of objects terminated by an end-of-list token (called SEMI, its address in the 49g and later is 0312B), similar to a null-terminated character string in C. Getting the list length has to be done like the C function strlen: walking through the list and adding 1 for any element encountered until the end token is reached. That obviously takes longer the more elements the list has.
In the end, even though HEAD should only need to access the first element in the list, it walks through the entire list.
You might wonder why it doesn't just pull the first element and then checks if it's SEMI ... the answer is "don't do that", because a SEMI on the stack can cause serious trouble, e.g. when it's decompiled for displaying or editing (the decompiler goes "what the heck ended here" and might very well crash or wreak havoc in your RAM). The proper solution would be the NULLCOMP? SysRPL command which does the zero-length check in ASM and only pushes the result of the comparison to the stack.

GET is a similar case. If it gets a list and a real number (it can process much more, e.g. a matrix and a list which should contain one or two elements, but we'll ignore that for now), then it checks if the number is 0 or lower (that's an error, of course), and finally if the number is at most the list length. How does it get the list length, you ask? Why, DUPLENCOMP of course! So the performance issue for long lists is the same as with HEAD.
The faster way would have been to use ASM code to walk through the list until either the desired index or the end is reached, which obviously avoids walking all the way to the end of a long list when the index is small. Unfortunately that ASM code would be significantly longer, and the error-generating subprogram of HEAD already showed us a glimpse of how tight the size limits were and what mad tricks the developers used to keep the size down. (As a matter of fact, some runstream abuse would have allowed writing this procedure in SysRPL as well, but at a similar size to the ASM version, and slower for high indices too.)

SUB is a rare beast. Its list-and-two-reals version just converts the numbers to system binary integers and refers to a SysRPL command without prior index-checking. Instead (believe it or not - if you don't, SUBCOMP is the command to read up about), the ASM code contained in that command performs the checks while it's doing its duty, similar to how I outlined in the paragraph about how I would have liked GET to be implemented.
With that knowledge in mind, maybe GET should have just called the internal version of SUB... that would have been possible in the same size GET's list-and-real version has now, but I don't think we'll get another ROM version, so that's a moot point anyway.

Edit: forgot to explain TAIL vs. 2. OVER SIZE SUB. While TAIL is a romptr like HEAD whereas none of the commands in the replacement sequence are, the performance difference is much easier to explain. TAIL actually does the SysRPL equivalent of 2. 1048575. SUB (note: larger-than-length stop index doesn't bother the internal version of SUB, and the UserRPL command SUB even inherits that trait), so the difference is obviously your use of OVER SIZE. Again, measuring the size of a list involves walking all the way through it, so SIZE takes a while to finish on your long list, whereas the large number used by TAIL is just a constant which gets put on the stack without any lengthy calculation. (By the way, the exact value of this constant has a meaning, it's the largest number representable by a 20-bit unsigned integer, which is what system binary integers really are.)