Post Reply 
Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
08-03-2020, 08:34 AM (This post was last modified: 08-03-2020 08:39 AM by Martin Hepperle.)
Post: #1
Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
I wrote a small HP-49G RPL program to test a Tektronix vector graphics terminal emulator (which I would normally use with a CP/M or MS-DOS system) – but why not with a graphing calculator?
The Tektronix terminals expect each pair 12-bit integer x-y- coordinates [0…4095] in a specific format as a sequence of 5 printable characters with the following bit patterns:
Input
Code:

X: X12.X11.X10.X9.X8.X7.X6.X5.X4.X3.X2.X1
Y: Y12.Y11.Y10.Y9.Y8.Y7.Y6.Y5.Y4.Y3.Y2.Y1
Output
Code:

Byte1: 0.0.1.Y12.Y11.Y10.Y9.Y8
Byte2: 0.1.1.0.Y2.Y1.X2.X1
Byte3: 0.1.1.Y7.Y6.Y5.Y4.Y3
Byte4: 0.0.1.X12.X11.X10.X9.X8
Byte5: 0.1.0.X7.X6.X5.X4.X3
Because the conversion has to be performed for each data point I would like to have it fast.
I wrote several version of the conversion subroutine
  1. using real numbers and local variables for X and Y
  2. using real numbers with X and Y on the stack
  3. using binary integers (with default 64 STWS) and local variables
  4. using binary integers (with 12 STWS) and local variables

Variant 3 and 4 were slowest and made no difference in speed. Version 2 using the stack was slightly faster than version 1 with the local variables. So currently version 2 is my favorite. Do you see more options to speed up the code without reverting to SysRPL or other heavy tricks?

On entry to my routine TEKXY the X and Y coordinates are on the stack. In my example the ranges are X=[0…400], Y=[0…2]. Therefore Y is scaled by 1500 and X by 10 to map approximately to the 12 bit range [0…4095]. The output of the routine is a 5 character string.
Example:
Code:
2: 123     note: X
1: 1       note: Y
‘TEKXY’ EVAL
1: “+bw)S”

Code of variant 2 (line breaks added to show scaling, composition of the bytes 1…5 and, final cleanup).
The first scaling step leaves X and Y swapped on the stack so that the PICKs use this stack layout:
Code:
1: Y
2: X
Code:
'TEKXY'
<< 1500. * SWAP 10. * 
2. PICK 128. / IP 32. MOD 32. + CHR 
PICK3 4. MOD 4. * PICK3 4. MOD + 96. + CHR + 
PICK3 4. / IP 32. MOD 96. + CHR + 
2. PICK 128. / IP 32. MOD 32. + CHR + 
SWAP 4. / IP 32. MOD 64. + CHR + 
SWAP DROP
>>
Find all posts by this user
Quote this message in a reply
08-03-2020, 10:23 AM
Post: #2
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
A quick reply:
  • 2. PICK is OVER
  • Why the 32. MOD? You have the highest bits already

Werner

41CV†,42S,48GX,49G,DM42,DM41X,17BII,15CE,DM15L,12C,16CE
Find all posts by this user
Quote this message in a reply
08-03-2020, 11:06 AM
Post: #3
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
Other than what Werner said I can't see anything that would make a noticeable improvement. If you know Saturn assembly language that would give you a huge speedup, probably 100x. I wouldn't be much help there, I haven't done any assembly programming since the HP-71 days.
Find all posts by this user
Quote this message in a reply
08-03-2020, 11:12 AM (This post was last modified: 08-03-2020 11:17 AM by Martin Hepperle.)
Post: #4
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
Thank you for your comments and tips.

The only thing I found in addition to the OVER and the superfluous MOD 32 was to use NIP instead of SWAP DROP at the end of TEKXY.
Each time I go through the Pocket guide I learn some new commands.

Also, I was somewhat astonished to see that the Binary Integer variant was slower than the variant with Reals.
Maybe I should try to refresh my SysRPL knowledge as a compromise - assembler is too much for me and this application.

Martin
Find all posts by this user
Quote this message in a reply
08-03-2020, 11:24 AM
Post: #5
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
Were the binary integers done with shifts and ANDs? I'm not surprised that they are slower than expected, they are variable length which is painful (but not as bad as the 16C's integer support that has carry and overflow as well).

System RPL would be a lot faster, the big win being the 20 bit short integers and much reduced overhead.


Pauli
Find all posts by this user
Quote this message in a reply
08-03-2020, 01:37 PM (This post was last modified: 08-03-2020 02:04 PM by 3298.)
Post: #6
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
Apart from the small stackrobatics improvements others have already mentioned, that's probably the fastest you can do with just UserRPL.
Oh, and the NIP cane be removed entirely if you switch the last PICK3 to a ROT.
32. MOD can only be removed after the two instances of 128. / IP, in the other spots it's still needed to throw the high bits away.
Side note: the OVER improvement made me think of the signature of fellow forum member HP67: "It ain't OVER 'till it's 2 PICK". Such a memorable signature sure makes remembering this command equivalence easier. Wink

I also toyed with a user binary integer version (word size doesn't matter as long as it's at least 12: all computations are 64-bit internally, with a truncation to the configured word size afterwards). Binary integers bring a bit-wise AND to the table, which is likely a little faster than MOD (32. MOD becomes #1Fh AND, 4. MOD becomes #3h AND), plus bit-wise OR as a + replacement if we like (performance should be equal there); they also have bit-shifts, which is probably faster than real number division too (and taking the integer part becomes superfluous, of course), but the shift amounts are hard-wired into the respective commands at a bit for certain commands or a byte for the rest, meaning multiple may be needed for arbitrary shifts. (In this instance two commands each, as 128. / IP becomes SL SRB, 4. / IP becomes SR SR, and 4. * becomes SL SL.) However, binary integers are (for whatever insane reason) not accepted for CHR, so we lose all performance gains to back-and-forth conversion. At least the conversion at the start of the program can be made implicit by simply translating your scaling factors into binary integers (#1500d and #10d, we can take advantage of the base suffix to keep things understandable). I also translated the fixed parts added just before CHR into binary, no reason to lose additional performance by making + convert them at runtime. Unfortunately, the back-conversion cannot be made implicit by leaving the + argument as real, like we could with the multiplication at the start... + goes binary if any argument is binary.
Result:
Code:
\<< #1500d * SWAP #10d *
  OVER SL SRB #20h OR CHR
  PICK3 #3h AND SL SL PICK3 #3h AND OR #60h OR CHR +
  ROT SR SR #1Fh AND #60h OR CHR +
  OVER SL SRB #20h OR CHR +
  SWAP SR SR #1Fh AND #40h OR CHR +
\ >>
In my tests, it's actually slightly slower than the real number version. What a shame.

When aiming for performance, though, I think SysRPL isn't just the heavy equipment, it's the right tool for the job. It leaves out all that slow error-checking, and we can use extremely fast BINTs for just about everything. There's #AND for them, and #>CHR as the equivalent of UserRPL CHR does take a BINT. (Yes, we need to CHR>$ the first one, but subsequent chars can just be appended to the string with >T$ without such a conversion.) No larger shifts, but 1-bit shifts are at the core of the Saturn ASM implementations of #2* and #2/. (There's also #8* with three 1-bit shifts, but we only need two, and there's no #4* unfortunately.)
A more-or-less direct translation into SysRPL is therefore:
Code:
::
  CK2&Dispatch
  REALREAL
  ::
    1500 #* SWAP BINT10 #*
    OVER #2/ #2/ #2/ #2/ #2/ #2/ #2/ BINT32 #+ #>CHR CHR>$
    3PICK BINT3 #AND #2* #2* 3PICK BINT3 #AND #+ BINT96 #+ #>CHR >T$
    ROT #2/ #2/ BINT31 #AND BINT96 #+ #>CHR >T$
    OVER #2/ #2/ #2/ #2/ #2/ #2/ #2/ BINT32 #+ #>CHR >T$
    SWAP #2/ #2/ BINT31 #AND BINT64 #+ #>CHR >T$
  ;
;
If we want to be naughty and take advantage of the lack of type checking and the fact that BINTs look like CHARs with a different (ignored) prolog and three additional nibbles of data at the end (the high-order nibbles, importantly), we could just skip the #>CHR commands, so that CHR>$ and all instances of >T$ simply read the BINTs as CHARs (with the same outcome). Really hacky, but shorter and slightly faster, so why not. Wink Compared to the UserRPL variant, we're already about 3 times as fast (at least in my tests, though I'm trying this in x49gp instead of a real calculator due to convenience).

Those chains of seven times #2/ bug me though. How about something that uses arithmetics in place of bitwise logic? #/ gives us both the quotient and the remainder, which could be quite useful: we can split a number into 5 high bits and 7 low bits with it (BINT128 #/), then split the low 2 bits off the latter with BINT4 #/ ... UserRPL does have something similar in some CAS command, I think it was IDIV2, but I'm more of a SysRPL person, so I don't quite remember that stuff. Oh well. Here's a SysRPL version then.
Code:
::
  CK2&Dispatch
  REALREAL
  ::
    COERCE2
    1500 #* SWAP BINT10 #*
    BINT128 #/ BINT32 #+ #>CHR CHR>$
    SWAP BINT4 #/ BINT64 #+ #>CHR ROTSWAP >T$
    ROT BINT128 #/ BINT32 #+ #>CHR 4UNROLL
    BINT4 #/ BINT96 #+ #>CHR ROTSWAP >H$
    UNROT #2* #2* #+ BINT96 #+ #>CHR >H$
    SWAP >H$
  ;
;
Even faster than the previous SysRPL version, and we can of course pull the same hack with removing #>CHR.

By the way, what's the deal with the scaling factors? The SysRPL versions both (and the UserRPL binary integer version) convert the input numbers to integers before multiplying, which may result in a loss of precision if there is a fractional part. That can be fixed by replacing:
Code:
COERCE2 1500 #* SWAP BINT10 #*
with
Code:
% 1500. %* SWAP %10 %* COERCE2
Otherwise, if only integer numbers go in, you might consider changing the 1500 to the nearby 1536 - similar magnitude, so the visual result should be similar, but in binary it's #11000000000b, and the nine trailing zeroes in the binary representation are of course preserved across the integer multiplication, which means Y1...Y9 are always 0. That in turn can be used to optimize some parts: the third byte will always be #01100000b (#96d = '`'), and the second byte only depends on X (and with the scaling factor for X being even but not divisible by 4, the only bit that can change there is X2, making the value #01100000b = #96d = '`' if X is even, or #01100010b = #98d = 'b' if X is odd). In the second SysRPL program that would replace these lines:
Code:
BINT4 #/ BINT96 #+ #>CHR ROTSWAP >H$
UNROT #2* #2* #+ BINT96 #+ #>CHR >H$
with
Code:
DROP CHR \60 >H$
SWAP BINT96 #+ #>CHR >H$
Even if it has to be 1500, the integer multiplication with something divisble by 4 (like 1500) makes the lowest two bits of Y always 0. That means the following replacement for the same two lines still works:
Code:
BINT4 #/ BINT96 #+ #>CHR SWAPDROP >H$
SWAP BINT96 #+ #>CHR >H$
Find all posts by this user
Quote this message in a reply
08-04-2020, 06:45 AM
Post: #7
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
Hello,

You might be able to gain on the OR and mod doing string OR/AND, therefore saving a lot of work (I do seem to remember that you can do logical operations on strings. Hope I am correct).
as in:

'TEKXY'
<< 1500. *
DUP 128. / CHR @ byte 1
OVER 4. / CHR @ byte 3
4. ROLLD 10. *
DUP 128. / CHR @ byte 4
OVER 4. / CHR @ byte 5
@ stack is Y b1 b2 X b4 b5
ROT 4. MOD 6. ROLLD 4. MOD 4. * + CHR UNROT
+ + + +
"?????" AND @ This is a specially formed string with 5 chr 31. Create it on the stack and then edit it to have it in "hard" in your program without having to create it.
" `` @" OR
>>

I have not tested it, it is kind of a proof of concept. Also my ROLL and other are WAY out of date, so I might have the rotation direction wrong in some cases...
However, should this technic work, it should be quite fast.

Cyrille

Although I work for the HP calculator group, the views and opinions I post here are my own. I do not speak for HP.
Find all posts by this user
Quote this message in a reply
08-04-2020, 07:07 AM
Post: #8
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
Hello Cyrille.
CHR unfortunately rounds its argument, so you'll need 4 more IPs, at least.
Cheers, Werner

41CV†,42S,48GX,49G,DM42,DM41X,17BII,15CE,DM15L,12C,16CE
Find all posts by this user
Quote this message in a reply
08-04-2020, 08:23 AM
Post: #9
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
The shortest I have been able to come up with so far: 177.5 bytes, #9FBDh
Code:
@ Y/1500 X/10
\<<
 11.71875 * DUP 32. + IP CHR
 ROT 2.5 * ROT FP 3. + 32. * ROT PICK3 FP PICK3 FP 6. + 4 * + 4. * CHR + 
 SWAP IP CHR +
 32. ROT OVER / DUP2 + IP CHR 
 UNROT FP 2. + * IP CHR + + 
\>>
Cheers, Werner

41CV†,42S,48GX,49G,DM42,DM41X,17BII,15CE,DM15L,12C,16CE
Find all posts by this user
Quote this message in a reply
08-04-2020, 09:45 AM (This post was last modified: 08-04-2020 09:53 AM by Martin Hepperle.)
Post: #10
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
Thank you very much for your ideas and solutions!

Especially the SysRPL version seems to be the most attractive to me.
I also toyed a bit with SysRPL (I must be considered a perfect noob in this black art) but ended up only with a 25% improvement in speed. But I used many conversions to HXS numbers to get bitOr and bitAnd, which was unnecessary. And it took me 2 hours to learn that I need a "11" or "REALREAL" at the start of the routine instead of the "BINT1" I had...

The background with the scaling factors is this:
The Tektronix terminals use a 12 bit integer range (0...4095) for x and y.
In my practical application I have to scale from my arbitrary, real valued user coordinate system to this integer range.
That's why in my example case I used 1500 and 10 as scaling factors. In real life these factors depend on the data range to be plotted and they are always needed.
[For real-world scaling the x-values would be scaled by 4095*(x-xMin)/(xMax-xMin) for full a width plot].

However, in the end everything goes through the serial interface at 9600 baud, so there are more speed limits in the chain. The end result is that the HP 48/49 can send vector graphics to an external display (a hardware terminal or software emulator like Windows TeraTerm).

Martin
Find all posts by this user
Quote this message in a reply
08-04-2020, 11:40 PM
Post: #11
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
Also tried my hand at a Saturn ASM implementation. Things probably won't get much faster than that ... though I left the error checking and integer conversion in SysRPL. Those two commands are themselves implemented in Saturn ASM anyway, so there's not much to be gained by replacing them with my own ASM. You'll have to do the scaling before calling this; but because you apparently want to change that on a per-application basis anyway, that's probably for the better. Leaving that in real number land also avoids those pesky accuracy issues.
Code:
::
  CK2&Dispatch
  REALREAL
  ::
    COERCE2
    CODE
      GOSBVL POP2#
      R0=C.A
      GOSBVL SAVPTR
      C=R0.A
      B=C.B
      BSL.X
      BSL.A
      BSL.A
      B=C.X
      BSRB.X
      P=6
      BSL.WP
      B=A.B
      BSL.WP
      P=0
      B=A.P
      B+B.W
      B+B.W
      CBIT=0.2
      CBIT=0.3
      B+C.P
      BSL.W
      BSL.W
      ASR.X
      A+A.X
      ASR.X
      B=A.B
      LA 4020606020
      LC 1F1F1F0F1F
      P=9
      B&C.WP
      A!B.WP
      GOSBVL GETPTR
      GOSBVL PUSHhxs
      C=DAT1.A
      CD1EX
      LA(5) DOCSTR
      DAT1=A.A
      D1=C
      A=DAT0.A
      D0+5
      PC=(A)
    ENDCODE
  ;
;
This one takes Y on level 2 and X on level 1. Might be handy if you multiply one, SWAP, multiply the other without another SWAP, but if you don't want that, replace all occurrences of B=C with B=A and vice versa, and also replace both occurrences of CBIT with ABIT and the B+C just below that to B+A. Those modifications reverse the parameters without any performance penalty.
By the way, TEVAL places the performance of this just ahead of the SysRPL version, but it's lying (it measures mostly its own overhead). Put it in a loop running a few dozen times, measure that, and divide by the number of iterations, and you'll see.

The way it works is to build the string in a 64-bit register (think of a user binary integer) via repeated shifts and loading one to three nibbles of a number into the low part of the register. (The Saturn CPU's register fields are very handy for this.) The processing order is from last to first byte due to little-endian byte order in memory (see below for why that's important), and because it's easier to work in the low section of the register and then shift the results out into the higher part.
After that, mask out the areas where fixed bits should be (the 0 in the mask 1F1F1F0F1F is no accident, that's the fourth-highest bit in the second byte which shall be fixed 0), put in the fixed 1-bits with another mask, and finally perform a small trick for the output: store it as a hxs (also known as user binary integer) with only 40 bits - and just change the prolog to a string's, because their format is essentially the same; the lowest byte becomes the first character in the string due to the aforementioned byte order.
Find all posts by this user
Quote this message in a reply
08-05-2020, 06:29 AM
Post: #12
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
Hello,

>Integer conversion in SysRPL. Those two commands are themselves implemented in Saturn ASM anyway, so there's not much to be gained by replacing them with my own ASM

Actually, you would gain a lot in doing them yourselve...
The reason for it is that, it is not the conversion which is long, but the memory allocation for the result. Mallocs are slow on the saturn because they require memory movement (of the RPL return stack). And memory moves are slow!!!!*

So, reducing the number of object creation is the main speedup on any RPL program in the 49.

Cyrille
*Come thinking about it, I think that the malloc calls have been reimplemented directly in C in the Arm based series.... hence helping a lot speed up things.

Although I work for the HP calculator group, the views and opinions I post here are my own. I do not speak for HP.
Find all posts by this user
Quote this message in a reply
08-05-2020, 06:36 AM
Post: #13
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
Careful with =PUSHhxs, that may cause a garbage collection... if your routine is in TEMPOB when it is executed (as is the case when it resides in a library in a covered port, for instance), and a garbage collection occurs, it will crash.
One easy way to avoid it is to put a 5-char dummy string on the stack, UNROT it, then read an treat the aguments and overwrite the string.
Cheers, Werner

41CV†,42S,48GX,49G,DM42,DM41X,17BII,15CE,DM15L,12C,16CE
Find all posts by this user
Quote this message in a reply
08-05-2020, 09:45 AM (This post was last modified: 08-05-2020 09:47 AM by Martin Hepperle.)
Post: #14
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
With the SysRPL varsion I achieve a speedup of about 2.5 compared to the UserRPL version.
This is fine for me and avoids that I fall asleep whily watching a plot appear on the screen.

My demo program now uses the functions Y1(X) and Y2(X) and the scaling as set up in the HP 49G plot application.
Therefore I replaced the hardwired scaling factors by data derived from the plot parameter list PPAR.

An initializing program calculates the translation values and the scale factors and places them on the stack.
In my routines I now pick the scaling parameters from the stack and perform the Real transformations (x-X0)*sx and (y-y0)*sy before COERCing them into BINTS and composing the bytes in Tektronix format.
This seems to be the most efficient way to perform these transforms (except reverting to Assembler, of course).

The nice thing is that I can now reproduce the same graph on the HP 49G as well as on my Tektronix emulator.

Thank you again for your help, which also triggered me to learn more about SysRPL.

Martin

PS: I must figure out first how the assembler version works before I climb up (or is it down?) to that level...

[attachment=8664]

Code:
ASSEMBLE 
  NIBASC /HPHP49-C/
                                                   ( TEKXY.s )
                                                   ( Bring X and Y into Tektronix format. )
                                                   ( Stack also contains the transformation  )
                                                   ( parameters extracted from TEKPAR. )
                                                   ( These map from User space to device space, )
                                                   ( which is integer in 4096x3072. )
RPL                                                
::                                                 ( 1:Y         2:X         3:SY 4:SX 5:Y0 6:X0 )                                         
  CK2&Dispatch
  REALREAL                                         ( need at least 2 Reals on the stack )
  ::
    ( translate and scale Real X, Y values )
    5PICK %-                                       ( 1:Y-Y0      2:X         3:SY 4:SX 5:Y0 6:X0 )
    3PICK %*                                       ( 1:[Y-Y0]*SY 2:X         3:SY 4:SX 5:Y0 6:X0 )
    SWAP                                           ( 1:X         2:[Y-Y0]*SY 3:SY 4:SX 5:Y0 6:X0 )
    BINT6 PICK %-                                  ( 1:X-X0      2:[Y-Y0]*SY 3:SY 4:SX 5:Y0 6:X0 )
    4PICK %*                                       ( 1:[X-X0]*SX 2:[Y-Y0]*SY 3:SY 4:SX 5:Y0 6:X0 )
    ( now bring these 2 integer values IX, IY into Tektronix format: a 5-character string )
    COERCE2                                        ( convert IX, IY values to BINT )
                                                   ( note: #/ leaves r and q on stack )
    BINT128 #/ BINT32 #+ #>CHR CHR>$               ( C4: 0.0.1.X12.X11.X10.X9.X8, to string  )
    SWAP BINT4 #/ BINT64 #+ #>CHR ROTSWAP >T$      ( C5: 0.1.0.X7.X6.X5.X4.X3, append to C4: C4&C5 )
    ROT BINT128 #/ BINT32 #+ #>CHR 4UNROLL         ( C1: 0.0.1.Y12.Y11.Y10.Y9.Y8 )
    BINT4 #/ BINT96 #+ #>CHR ROTSWAP >H$           ( C3: 0.1.1.Y7.Y6.Y5.Y4.Y3, prepend to C4&C5: C3&C4&C5 )
    UNROT #2* #2* #+ BINT96 #+ #>CHR >H$           ( C2: 0.1.1.0.Y2.Y1.X2.X1, append to C1: C1&C2) 
    SWAP >H$                                       ( finally: C1&C2&C3&C4&C5 )
    ( leaves a concatenated string of 5 characters: )                                      
    ( C1: [0.0.1.Y12.Y11.Y10.Y9.Y8] )
    ( C2: & [0.1.1.0.Y2.Y1.X2.X1] )
    ( C3: & [0.1.1.Y7.Y6.Y5.Y4.Y3] )
    ( C4: & [0.0.1.X12.X11.X10.X9.X8] )
    ( C5: & [0.1.0.X7.X6.X5.X4.X3] )
  ;
;
Find all posts by this user
Quote this message in a reply
08-05-2020, 10:40 AM (This post was last modified: 08-05-2020 11:42 AM by 3298.)
Post: #15
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
(08-05-2020 06:29 AM)cyrille de brébisson Wrote:  Actually, you would gain a lot in doing them yourselve...
The reason for it is that, it is not the conversion which is long, but the memory allocation for the result. Mallocs are slow on the saturn because they require memory movement (of the RPL return stack). And memory moves are slow!!!!*

So, reducing the number of object creation is the main speedup on any RPL program in the 49.

Cyrille
*Come thinking about it, I think that the malloc calls have been reimplemented directly in C in the Arm based series.... hence helping a lot speed up things.
Uhm, good point... I guess I'll have to learn how to COERCE reals to binary integers in Saturn ASM.
(08-05-2020 06:36 AM)Werner Wrote:  Careful with =PUSHhxs, that may cause a garbage collection... if your routine is in TEMPOB when it is executed (as is the case when it resides in a library in a covered port, for instance), and a garbage collection occurs, it will crash.
One easy way to avoid it is to put a 5-char dummy string on the stack, UNROT it, then read an treat the aguments and overwrite the string.
Cheers, Werner
Doesn't that apply to pushing BINTS > 131 too? That would make pretty much any ASM program pushing stuff other than already existing objects unusable from covered ports (or freshly compiled on the stack and not stored in USEROB yet). I would have thought that garbage collection adjusts the RSTK entries if necessary, so it can return into the moved code.
If not, I could indeed have the surrounding SysRPL push a dummy string (best practice would be to TOTEMPOB it too; while its contents don't matter as they'll get overwritten, changing the copy embedded in the SysRPL program would change checksums of directories or libraries in port 0). Or maybe have the code copy its end starting from the PUSHhxs call into that scratch area at 80100 so it can run in peace. (By the way, does anybody know how long that scratch area is? It's obviously sufficient for a few instructions, but knowing its length could be handy in other cases.)
(08-05-2020 09:45 AM)Martin Hepperle Wrote:  
Code:
  CK2&Dispatch
  REALREAL                                         ( need at least 2 Reals on the stack )
  ::
    ( translate and scale Real X, Y values )
    5PICK %-                                       ( 1:Y-Y0      2:X         3:SY 4:SX 5:Y0 6:X0 )
Ouch, you're checking for only two parameters (which have to be reals), but then you're blindly accessing stack level five. That's gonna blow up in your face when you start this program with less than five parameters, or when the upper ones aren't of the right type (looks like you want them to be reals too). A bit further down you access level six too (by the way, for that access you can use 6PICK instead of BINT6 PICK, that command does exist).
Perhaps try basic CK&Dispatch1 instead (which doesn't check for parameter presence; the CK<n> familiy only goes to five parameters, so you'll have to do it manually anyway, something like DEPTH BINT6 #<case SETSTACKERR before the CK&Dispatch1), change the REALREAL (value: 11h) to something like # 11111 so it checks all five levels it can deal with, then also add code to manually check for the type of the sixth level (perhaps 6PICK TYPEREAL? NOTcase SETTYPEERR, or alternatively for the automatic number type conversion from ZINTs a second CK&Dispatch1, like this: 6ROLL CK&Dispatch1 BINT1 :: 6UNROLL).

I'll also help you along with understanding how I made the Saturn code build the string. I should've commented it like this from the beginning... (Also, sorry about the weird characters that might show up in the longer comment lines, the forum is responsible for that and I can't remove them.)
Code:
::
  CK2&Dispatch
  REALREAL
  ::
    COERCE2
    CODE
      GOSBVL POP2# ; X in C.A, Y in A.A
      R0=C.A ; SAVPTR uses C.A, so it has to be backed up
      GOSBVL SAVPTR
      C=R0.A
      B=C.B ; B = X8.X7.X6.X5.X4.X3.X2.X1
      BSL.X ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.0.0.0
      BSL.A ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.0.0.0.0.0.0.0
      BSL.A ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.0.0.0.0.0.0.0.0.0.0.0
      B=C.X ; B = X8.X7.X6.X5.X4.X3.X2.X1.X12.X11.X10.X9.X8.X7.X6.X5.X4.X3.X2.X1
      BSRB.X ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.X5.X4.X3.X2
      P=6
      BSL.WP ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.X5.X4.X3.X2.0.0.0.0
      B=A.B ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.Y8.Y7.Y6.Y5.Y4.Y3.Y2.Y1
      BSL.WP ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.Y8.Y7.Y6.Y5.Y4.Y3.Y2.Y1.0​.0.0.0
      P=0
      B=A.P ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.Y8.Y7.Y6.Y5.Y4.Y3.Y2.Y1.Y​4.Y3.Y2.Y1
      B+B.W ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.Y8.Y7.Y6.Y5.Y4.Y3.Y2.Y1.Y​4.Y3.Y2.Y1.0
      B+B.W ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.Y8.Y7.Y6.Y5.Y4.Y3.Y2.Y1.Y​4.Y3.Y2.Y1.0.0
      CBIT=0.2 ; clear X3 so it doesn't interfere with Y1
      CBIT=0.3 ; same for X4
      B+C.P ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.Y8.Y7.Y6.Y5.Y4.Y3.Y2.Y1.Y​4.Y3.Y2.Y1.X2.X1
      BSL.W ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.Y8.Y7.Y6.Y5.Y4.Y3.Y2.Y1.Y​4.Y3.Y2.Y1.X2.X1.0.0.0.0
      BSL.W ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.Y8.Y7.Y6.Y5.Y4.Y3.Y2.Y1.Y​4.Y3.Y2.Y1.X2.X1.0.0.0.0.0.0.0.0
      ASR.X ; A.X = 0.0.0.0.Y12.Y11.Y10.Y9.Y8.Y7.Y6.Y5
      A+A.X ; A.X = 0.0.0.Y12.Y11.Y10.Y9.Y8.Y7.Y6.Y5.0
      ASR.X ; A.X = 0.0.0.0.0.0.0.Y12.Y11.Y10.Y9.Y8
      B=A.B ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.Y8.Y7.Y6.Y5.Y4.Y3.Y2.Y1.Y​4.Y3.Y2.Y1.X2.X1.0.0.0.Y12.Y11.Y10.Y9.Y8
      LA 4020606020
      LC 1F1F1F0F1F
      P=9
      B&C.WP ; B = X8.X7.X6.X5.X4.X3.0.0.0.X12.X11.X10.X9.X8.0.0.0.Y7.Y6.Y5.Y4.Y3.0.0.0.0.Y2.Y​1.X2.X1.0.0.0.Y12.Y11.Y10.Y9.Y8
      A!B.WP ; B = 0.1.0.X8.X7.X6.X5.X4.X3.0.1.0.X12.X11.X10.X9.X8.0.1.1.Y7.Y6.Y5.Y4.Y3.0.1.1.​0.Y2.Y1.X2.X1.0.1.0.Y12.Y11.Y10.Y9.Y8
      GOSBVL GETPTR
      GOSBVL PUSHhxs ; the object is now laid out like this: (5) DOHXS (5) length=0000Fh (10) data
; data first byte: 0.1.0.Y12.Y11.Y10.Y9.Y8
; data second byte: 0.1.1.0.Y2.Y1.X2.X1
; data third byte: 0.1.1.Y7.Y6.Y5.Y4.Y3
; data fourth byte: 0.1.0.X12.X11.X10.X9.X8
; data fifth byte: 0.1.0.X8.X7.X6.X5.X4.X3
      C=DAT1.A
      CD1EX
      LA(5) DOCSTR
      DAT1=A.A ; overwrite the prolog so it turns from DOHXS to DOCSTR
      D1=C
      A=DAT0.A
      D0+5
      PC=(A)
    ENDCODE
  ;
;
Find all posts by this user
Quote this message in a reply
08-06-2020, 12:30 AM
Post: #16
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
(08-04-2020 09:45 AM)Martin Hepperle Wrote:  The background with the scaling factors is this:
The Tektronix terminals use a 12 bit integer range (0...4095) for x and y.

Since you are scaling, you could run your Tektronic in its 'backwards compatible' mode by sending the 4-character codes used by the previous 1024x1024 resolution model (which it should auto-detect). These are much simpler to calculate.

The following code expects X,Y on the stack in the range 0..1023 and uses list processing to speed up ordinary userRPL.

Code:
<< DUP ROT DUP 4 ->LIST R->B
{ 32 1 32 1 } / # 31d AND
{ 32 96 32 64 } ADD
B->R CHR  << + >> STREAM >>

The other option to consider would be to generate your coordinates as an N x 2 matrix or list and use list processing to calculate a string of N codes all at once. I'll see if I can adapt the above when I have a spare moment.
Find all posts by this user
Quote this message in a reply
08-06-2020, 06:15 AM
Post: #17
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
About garbage collection:
Your ASM routine, even if in the heap will not be garbaged collected because it's address is in the return stack.. As a result, it will be marked as in use and kept. So there is no risks.

Nice list based version. It will be shorted in space, but most likely slower to execute as list processing is done in sysRPL and generates memory allocs...

Cyrille

Although I work for the HP calculator group, the views and opinions I post here are my own. I do not speak for HP.
Find all posts by this user
Quote this message in a reply
08-06-2020, 06:28 AM (This post was last modified: 08-06-2020 06:43 AM by Werner.)
Post: #18
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
(08-05-2020 10:40 AM)3298 Wrote:  Doesn't that apply to pushing BINTS > 131 too? That would make pretty much any ASM program pushing stuff other than already existing objects unusable from covered ports (or freshly compiled on the stack and not stored in USEROB yet). I would have thought that garbage collection adjusts the RSTK entries if necessary, so it can return into the moved code.
If not, I could indeed have the surrounding SysRPL push a dummy string (best practice would be to TOTEMPOB it too; while its contents don't matter as they'll get overwritten, changing the copy embedded in the SysRPL program would change checksums of directories or libraries in port 0). Or maybe have the code copy its end starting from the PUSHhxs call into that scratch area at 80100 so it can run in peace. (By the way, does anybody know how long that scratch area is? It's obviously sufficient for a few instructions, but knowing its length could be handy in other cases.)

Garbage collection adjusts the SYSRPL pointers, but not the ML return stack. So if your code moves because of a gc (because it is in TEMPOB), the return will be to a wrong address.
Any PUSH may cause a gc. PUSHing a BINT<132 onto the stack still uses up the 5 nibbles of a data stack entry, and may cause garbage collection.
BUT: if the PUSH is the last instruction in your CODE object (so it has to be combined with =Loop, like for instance PUSH#LOOP), then there is no return to the code object, but to the next SysRPL object in the runstream, and SYSRPL is impervious to gc's.
And you are absolutely right that the dummy string must be TOTEMPOB'd !! Forgot about that, it's been a while..

Cheers, Werner

41CV†,42S,48GX,49G,DM42,DM41X,17BII,15CE,DM15L,12C,16CE
Find all posts by this user
Quote this message in a reply
08-06-2020, 06:35 AM (This post was last modified: 08-06-2020 06:35 AM by Werner.)
Post: #19
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
(08-06-2020 06:15 AM)cyrille de brébisson Wrote:  About garbage collection:
Your ASM routine, even if in the heap will not be garbaged collected because it's address is in the return stack.. As a result, it will be marked as in use and kept. So there is no risks.
Cyrille
There is no risk of it being removed, but of being *moved* in TEMPOB. But I should not be explaining you this, of all people ;-)
Werner

41CV†,42S,48GX,49G,DM42,DM41X,17BII,15CE,DM15L,12C,16CE
Find all posts by this user
Quote this message in a reply
08-06-2020, 09:17 AM
Post: #20
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
(08-05-2020 10:40 AM)3298 Wrote:  ...
(08-05-2020 09:45 AM)Martin Hepperle Wrote:  
Code:
  CK2&Dispatch
  REALREAL                                         ( need at least 2 Reals on the stack )
  ::
    ( translate and scale Real X, Y values )
    5PICK %-                                       ( 1:Y-Y0      2:X         3:SY 4:SX 5:Y0 6:X0 )
Ouch, you're checking for only two parameters (which have to be reals), but then you're blindly accessing stack level five. That's gonna blow up in your face when you start this program with less than five parameters, or when the upper ones aren't of the right type (looks like you want them to be reals too). A bit further down you access level six too (by the way, for that access you can use 6PICK instead of BINT6 PICK, that command does exist).
Perhaps try basic CK&Dispatch1 instead (which doesn't check for parameter presence; the CK<n> familiy only goes to five parameters, so you'll have to do it manually anyway, something like DEPTH BINT6 #<case SETSTACKERR before the CK&Dispatch1), change the REALREAL (value: 11h) to something like # 11111 so it checks all five levels it can deal with, then also add code to manually check for the type of the sixth level (perhaps 6PICK TYPEREAL? NOTcase SETTYPEERR, or alternatively for the automatic number type conversion from ZINTs a second CK&Dispatch1, like this: 6ROLL CK&Dispatch1 BINT1 :: 6UNROLL).

I'll also help you along with understanding how I made the Saturn code build the string. I should've commented it like this from the beginning... (Also, sorry about the weird characters that might show up in the longer comment lines, the forum is responsible for that and I can't remove them.)
...

Thank you for commenting the assembler code - I had started to decrypt it but now it becomes much easier for me. All this nibble twisting is still confusing me.

I had not seen that 6PICK exists - I only found 5PICK in my documents.

You are right concerning the type check for the stack parameters - I will see how foolproof this routine should be. Maybe I simply stop at the test for 5 REALs with 11111b to not slow down the code too much. Maybe I'll add the simple DEPTH check to make it clear that 6 parameters are needed.

Concerning the scaling to 12 bits I will leave it as it is - maybe one day I can run it on a real Tektronix display with the high resolution enhancements.

I also managed to use the EQ list from the Plot application so that I now iterate over all functions set up in the Plot Application without becoming too sluggish.

Martin
Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 8 Guest(s)