Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - Printable Version +- HP Forums (https://www.hpmuseum.org/forum) +-- Forum: HP Calculators (and very old HP Computers) (/forum-3.html) +--- Forum: General Forum (/forum-4.html) +--- Thread: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal (/thread-15421.html) Pages: 1 2 |
Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - Martin Hepperle - 08-03-2020 08:34 AM I wrote a small HP-49G RPL program to test a Tektronix vector graphics terminal emulator (which I would normally use with a CP/M or MS-DOS system) – but why not with a graphing calculator? The Tektronix terminals expect each pair 12-bit integer x-y- coordinates [0…4095] in a specific format as a sequence of 5 printable characters with the following bit patterns: Input Code:
Code:
I wrote several version of the conversion subroutine
Variant 3 and 4 were slowest and made no difference in speed. Version 2 using the stack was slightly faster than version 1 with the local variables. So currently version 2 is my favorite. Do you see more options to speed up the code without reverting to SysRPL or other heavy tricks? On entry to my routine TEKXY the X and Y coordinates are on the stack. In my example the ranges are X=[0…400], Y=[0…2]. Therefore Y is scaled by 1500 and X by 10 to map approximately to the 12 bit range [0…4095]. The output of the routine is a 5 character string. Example: Code: 2: 123 note: X Code of variant 2 (line breaks added to show scaling, composition of the bytes 1…5 and, final cleanup). The first scaling step leaves X and Y swapped on the stack so that the PICKs use this stack layout: Code: 1: Y Code: 'TEKXY' RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - Werner - 08-03-2020 10:23 AM A quick reply:
Werner RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - John Keith - 08-03-2020 11:06 AM Other than what Werner said I can't see anything that would make a noticeable improvement. If you know Saturn assembly language that would give you a huge speedup, probably 100x. I wouldn't be much help there, I haven't done any assembly programming since the HP-71 days. RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - Martin Hepperle - 08-03-2020 11:12 AM Thank you for your comments and tips. The only thing I found in addition to the OVER and the superfluous MOD 32 was to use NIP instead of SWAP DROP at the end of TEKXY. Each time I go through the Pocket guide I learn some new commands. Also, I was somewhat astonished to see that the Binary Integer variant was slower than the variant with Reals. Maybe I should try to refresh my SysRPL knowledge as a compromise - assembler is too much for me and this application. Martin RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - Paul Dale - 08-03-2020 11:24 AM Were the binary integers done with shifts and ANDs? I'm not surprised that they are slower than expected, they are variable length which is painful (but not as bad as the 16C's integer support that has carry and overflow as well). System RPL would be a lot faster, the big win being the 20 bit short integers and much reduced overhead. Pauli RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - 3298 - 08-03-2020 01:37 PM Apart from the small stackrobatics improvements others have already mentioned, that's probably the fastest you can do with just UserRPL. Oh, and the NIP cane be removed entirely if you switch the last PICK3 to a ROT. 32. MOD can only be removed after the two instances of 128. / IP, in the other spots it's still needed to throw the high bits away. Side note: the OVER improvement made me think of the signature of fellow forum member HP67: "It ain't OVER 'till it's 2 PICK". Such a memorable signature sure makes remembering this command equivalence easier. I also toyed with a user binary integer version (word size doesn't matter as long as it's at least 12: all computations are 64-bit internally, with a truncation to the configured word size afterwards). Binary integers bring a bit-wise AND to the table, which is likely a little faster than MOD (32. MOD becomes #1Fh AND, 4. MOD becomes #3h AND), plus bit-wise OR as a + replacement if we like (performance should be equal there); they also have bit-shifts, which is probably faster than real number division too (and taking the integer part becomes superfluous, of course), but the shift amounts are hard-wired into the respective commands at a bit for certain commands or a byte for the rest, meaning multiple may be needed for arbitrary shifts. (In this instance two commands each, as 128. / IP becomes SL SRB, 4. / IP becomes SR SR, and 4. * becomes SL SL.) However, binary integers are (for whatever insane reason) not accepted for CHR, so we lose all performance gains to back-and-forth conversion. At least the conversion at the start of the program can be made implicit by simply translating your scaling factors into binary integers (#1500d and #10d, we can take advantage of the base suffix to keep things understandable). I also translated the fixed parts added just before CHR into binary, no reason to lose additional performance by making + convert them at runtime. Unfortunately, the back-conversion cannot be made implicit by leaving the + argument as real, like we could with the multiplication at the start... + goes binary if any argument is binary. Result: Code: \<< #1500d * SWAP #10d * When aiming for performance, though, I think SysRPL isn't just the heavy equipment, it's the right tool for the job. It leaves out all that slow error-checking, and we can use extremely fast BINTs for just about everything. There's #AND for them, and #>CHR as the equivalent of UserRPL CHR does take a BINT. (Yes, we need to CHR>$ the first one, but subsequent chars can just be appended to the string with >T$ without such a conversion.) No larger shifts, but 1-bit shifts are at the core of the Saturn ASM implementations of #2* and #2/. (There's also #8* with three 1-bit shifts, but we only need two, and there's no #4* unfortunately.) A more-or-less direct translation into SysRPL is therefore: Code: :: Those chains of seven times #2/ bug me though. How about something that uses arithmetics in place of bitwise logic? #/ gives us both the quotient and the remainder, which could be quite useful: we can split a number into 5 high bits and 7 low bits with it (BINT128 #/), then split the low 2 bits off the latter with BINT4 #/ ... UserRPL does have something similar in some CAS command, I think it was IDIV2, but I'm more of a SysRPL person, so I don't quite remember that stuff. Oh well. Here's a SysRPL version then. Code: :: By the way, what's the deal with the scaling factors? The SysRPL versions both (and the UserRPL binary integer version) convert the input numbers to integers before multiplying, which may result in a loss of precision if there is a fractional part. That can be fixed by replacing: Code: COERCE2 1500 #* SWAP BINT10 #* Code: % 1500. %* SWAP %10 %* COERCE2 Code: BINT4 #/ BINT96 #+ #>CHR ROTSWAP >H$ Code: DROP CHR \60 >H$ Code: BINT4 #/ BINT96 #+ #>CHR SWAPDROP >H$ RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - cyrille de brébisson - 08-04-2020 06:45 AM Hello, You might be able to gain on the OR and mod doing string OR/AND, therefore saving a lot of work (I do seem to remember that you can do logical operations on strings. Hope I am correct). as in: 'TEKXY' << 1500. * DUP 128. / CHR @ byte 1 OVER 4. / CHR @ byte 3 4. ROLLD 10. * DUP 128. / CHR @ byte 4 OVER 4. / CHR @ byte 5 @ stack is Y b1 b2 X b4 b5 ROT 4. MOD 6. ROLLD 4. MOD 4. * + CHR UNROT + + + + "?????" AND @ This is a specially formed string with 5 chr 31. Create it on the stack and then edit it to have it in "hard" in your program without having to create it. " `` @" OR >> I have not tested it, it is kind of a proof of concept. Also my ROLL and other are WAY out of date, so I might have the rotation direction wrong in some cases... However, should this technic work, it should be quite fast. Cyrille RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - Werner - 08-04-2020 07:07 AM Hello Cyrille. CHR unfortunately rounds its argument, so you'll need 4 more IPs, at least. Cheers, Werner RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - Werner - 08-04-2020 08:23 AM The shortest I have been able to come up with so far: 177.5 bytes, #9FBDh Code: @ Y/1500 X/10 RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - Martin Hepperle - 08-04-2020 09:45 AM Thank you very much for your ideas and solutions! Especially the SysRPL version seems to be the most attractive to me. I also toyed a bit with SysRPL (I must be considered a perfect noob in this black art) but ended up only with a 25% improvement in speed. But I used many conversions to HXS numbers to get bitOr and bitAnd, which was unnecessary. And it took me 2 hours to learn that I need a "11" or "REALREAL" at the start of the routine instead of the "BINT1" I had... The background with the scaling factors is this: The Tektronix terminals use a 12 bit integer range (0...4095) for x and y. In my practical application I have to scale from my arbitrary, real valued user coordinate system to this integer range. That's why in my example case I used 1500 and 10 as scaling factors. In real life these factors depend on the data range to be plotted and they are always needed. [For real-world scaling the x-values would be scaled by 4095*(x-xMin)/(xMax-xMin) for full a width plot]. However, in the end everything goes through the serial interface at 9600 baud, so there are more speed limits in the chain. The end result is that the HP 48/49 can send vector graphics to an external display (a hardware terminal or software emulator like Windows TeraTerm). Martin RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - 3298 - 08-04-2020 11:40 PM Also tried my hand at a Saturn ASM implementation. Things probably won't get much faster than that ... though I left the error checking and integer conversion in SysRPL. Those two commands are themselves implemented in Saturn ASM anyway, so there's not much to be gained by replacing them with my own ASM. You'll have to do the scaling before calling this; but because you apparently want to change that on a per-application basis anyway, that's probably for the better. Leaving that in real number land also avoids those pesky accuracy issues. Code: :: By the way, TEVAL places the performance of this just ahead of the SysRPL version, but it's lying (it measures mostly its own overhead). Put it in a loop running a few dozen times, measure that, and divide by the number of iterations, and you'll see. The way it works is to build the string in a 64-bit register (think of a user binary integer) via repeated shifts and loading one to three nibbles of a number into the low part of the register. (The Saturn CPU's register fields are very handy for this.) The processing order is from last to first byte due to little-endian byte order in memory (see below for why that's important), and because it's easier to work in the low section of the register and then shift the results out into the higher part. After that, mask out the areas where fixed bits should be (the 0 in the mask 1F1F1F0F1F is no accident, that's the fourth-highest bit in the second byte which shall be fixed 0), put in the fixed 1-bits with another mask, and finally perform a small trick for the output: store it as a hxs (also known as user binary integer) with only 40 bits - and just change the prolog to a string's, because their format is essentially the same; the lowest byte becomes the first character in the string due to the aforementioned byte order. RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - cyrille de brébisson - 08-05-2020 06:29 AM Hello, >Integer conversion in SysRPL. Those two commands are themselves implemented in Saturn ASM anyway, so there's not much to be gained by replacing them with my own ASM Actually, you would gain a lot in doing them yourselve... The reason for it is that, it is not the conversion which is long, but the memory allocation for the result. Mallocs are slow on the saturn because they require memory movement (of the RPL return stack). And memory moves are slow!!!!* So, reducing the number of object creation is the main speedup on any RPL program in the 49. Cyrille *Come thinking about it, I think that the malloc calls have been reimplemented directly in C in the Arm based series.... hence helping a lot speed up things. RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - Werner - 08-05-2020 06:36 AM Careful with =PUSHhxs, that may cause a garbage collection... if your routine is in TEMPOB when it is executed (as is the case when it resides in a library in a covered port, for instance), and a garbage collection occurs, it will crash. One easy way to avoid it is to put a 5-char dummy string on the stack, UNROT it, then read an treat the aguments and overwrite the string. Cheers, Werner RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - Martin Hepperle - 08-05-2020 09:45 AM With the SysRPL varsion I achieve a speedup of about 2.5 compared to the UserRPL version. This is fine for me and avoids that I fall asleep whily watching a plot appear on the screen. My demo program now uses the functions Y1(X) and Y2(X) and the scaling as set up in the HP 49G plot application. Therefore I replaced the hardwired scaling factors by data derived from the plot parameter list PPAR. An initializing program calculates the translation values and the scale factors and places them on the stack. In my routines I now pick the scaling parameters from the stack and perform the Real transformations (x-X0)*sx and (y-y0)*sy before COERCing them into BINTS and composing the bytes in Tektronix format. This seems to be the most efficient way to perform these transforms (except reverting to Assembler, of course). The nice thing is that I can now reproduce the same graph on the HP 49G as well as on my Tektronix emulator. Thank you again for your help, which also triggered me to learn more about SysRPL. Martin PS: I must figure out first how the assembler version works before I climb up (or is it down?) to that level... [attachment=8664] Code: ASSEMBLE RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - 3298 - 08-05-2020 10:40 AM (08-05-2020 06:29 AM)cyrille de brébisson Wrote: Actually, you would gain a lot in doing them yourselve...Uhm, good point... I guess I'll have to learn how to COERCE reals to binary integers in Saturn ASM. (08-05-2020 06:36 AM)Werner Wrote: Careful with =PUSHhxs, that may cause a garbage collection... if your routine is in TEMPOB when it is executed (as is the case when it resides in a library in a covered port, for instance), and a garbage collection occurs, it will crash.Doesn't that apply to pushing BINTS > 131 too? That would make pretty much any ASM program pushing stuff other than already existing objects unusable from covered ports (or freshly compiled on the stack and not stored in USEROB yet). I would have thought that garbage collection adjusts the RSTK entries if necessary, so it can return into the moved code. If not, I could indeed have the surrounding SysRPL push a dummy string (best practice would be to TOTEMPOB it too; while its contents don't matter as they'll get overwritten, changing the copy embedded in the SysRPL program would change checksums of directories or libraries in port 0). Or maybe have the code copy its end starting from the PUSHhxs call into that scratch area at 80100 so it can run in peace. (By the way, does anybody know how long that scratch area is? It's obviously sufficient for a few instructions, but knowing its length could be handy in other cases.) (08-05-2020 09:45 AM)Martin Hepperle Wrote:Ouch, you're checking for only two parameters (which have to be reals), but then you're blindly accessing stack level five. That's gonna blow up in your face when you start this program with less than five parameters, or when the upper ones aren't of the right type (looks like you want them to be reals too). A bit further down you access level six too (by the way, for that access you can use 6PICK instead of BINT6 PICK, that command does exist). Perhaps try basic CK&Dispatch1 instead (which doesn't check for parameter presence; the CK<n> familiy only goes to five parameters, so you'll have to do it manually anyway, something like DEPTH BINT6 #<case SETSTACKERR before the CK&Dispatch1), change the REALREAL (value: 11h) to something like # 11111 so it checks all five levels it can deal with, then also add code to manually check for the type of the sixth level (perhaps 6PICK TYPEREAL? NOTcase SETTYPEERR, or alternatively for the automatic number type conversion from ZINTs a second CK&Dispatch1, like this: 6ROLL CK&Dispatch1 BINT1 :: 6UNROLL). I'll also help you along with understanding how I made the Saturn code build the string. I should've commented it like this from the beginning... (Also, sorry about the weird characters that might show up in the longer comment lines, the forum is responsible for that and I can't remove them.) Code: :: RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - BruceH - 08-06-2020 12:30 AM (08-04-2020 09:45 AM)Martin Hepperle Wrote: The background with the scaling factors is this: Since you are scaling, you could run your Tektronic in its 'backwards compatible' mode by sending the 4-character codes used by the previous 1024x1024 resolution model (which it should auto-detect). These are much simpler to calculate. The following code expects X,Y on the stack in the range 0..1023 and uses list processing to speed up ordinary userRPL. Code: << DUP ROT DUP 4 ->LIST R->B The other option to consider would be to generate your coordinates as an N x 2 matrix or list and use list processing to calculate a string of N codes all at once. I'll see if I can adapt the above when I have a spare moment. RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - cyrille de brébisson - 08-06-2020 06:15 AM About garbage collection: Your ASM routine, even if in the heap will not be garbaged collected because it's address is in the return stack.. As a result, it will be marked as in use and kept. So there is no risks. Nice list based version. It will be shorted in space, but most likely slower to execute as list processing is done in sysRPL and generates memory allocs... Cyrille RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - Werner - 08-06-2020 06:28 AM (08-05-2020 10:40 AM)3298 Wrote: Doesn't that apply to pushing BINTS > 131 too? That would make pretty much any ASM program pushing stuff other than already existing objects unusable from covered ports (or freshly compiled on the stack and not stored in USEROB yet). I would have thought that garbage collection adjusts the RSTK entries if necessary, so it can return into the moved code. Garbage collection adjusts the SYSRPL pointers, but not the ML return stack. So if your code moves because of a gc (because it is in TEMPOB), the return will be to a wrong address. Any PUSH may cause a gc. PUSHing a BINT<132 onto the stack still uses up the 5 nibbles of a data stack entry, and may cause garbage collection. BUT: if the PUSH is the last instruction in your CODE object (so it has to be combined with =Loop, like for instance PUSH#LOOP), then there is no return to the code object, but to the next SysRPL object in the runstream, and SYSRPL is impervious to gc's. And you are absolutely right that the dummy string must be TOTEMPOB'd !! Forgot about that, it's been a while.. Cheers, Werner RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - Werner - 08-06-2020 06:35 AM (08-06-2020 06:15 AM)cyrille de brébisson Wrote: About garbage collection:There is no risk of it being removed, but of being *moved* in TEMPOB. But I should not be explaining you this, of all people ;-) Werner RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal - Martin Hepperle - 08-06-2020 09:17 AM (08-05-2020 10:40 AM)3298 Wrote: ... Thank you for commenting the assembler code - I had started to decrypt it but now it becomes much easier for me. All this nibble twisting is still confusing me. I had not seen that 6PICK exists - I only found 5PICK in my documents. You are right concerning the type check for the stack parameters - I will see how foolproof this routine should be. Maybe I simply stop at the test for 5 REALs with 11111b to not slow down the code too much. Maybe I'll add the simple DEPTH check to make it clear that 6 parameters are needed. Concerning the scaling to 12 bits I will leave it as it is - maybe one day I can run it on a real Tektronix display with the high resolution enhancements. I also managed to use the EQ list from the Plot application so that I now iterate over all functions set up in the Plot Application without becoming too sluggish. Martin |