Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
08-06-2020, 10:31 AM
Post: #21
 3298 Member Posts: 179 Joined: Oct 2014
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
Apologies for dragging this thread further off-topic, but I did an experiment to check how safe memory allocations from code in TEMPOB really are.
The code is designed to detect if it has been moved during execution. A pair of "checkpoints" (consisting of a GOSUB to a label just after it, followed by C=RSTK) are used to calculate the distance between two labels as they are encountered at runtime, then that distance is compared to the one calculated by the distance as calculated by the compiler. All this surrounds a call to PUSHhxs, since we've apparently chosen this call in our discussion as our favorite GC trigger.
The SysRPL program around the code deliberately sets up a low-memory condition, puts an HXS into TEMPOB followed by a copy of the code object. The HXS is then dropped to let the GC move the code object in order to make room for the code-pushed HXS, then the code is run. As a little cleanup the low-memory situation is cleaned up.
This program leaves the ASM-pushed HXS on the stack (different from the SysRPL-pushed one in its contents, by the way; that makes them easy to tell apart), and a SysRPL TRUE/FALSE flag telling me whether the code was moved in the middle of execution.
Code:
::   ' CODE     GOSBVL SAVPTR     GOSUB label1 *label1     C=RSTK     R4=C.A ; the garbage collection documentation promises that it leaves R4 intact     LA 123456789ABCDEF     P=15     GOSBVL PUSHhxs ; triggers GC, should ultimately succeed because another hxs could be cleaned up     GOSUB label2 *label2     C=RSTK     A=R4.A     C-A.A ; C.A is the distance between where the labels were encountered at runtime     LA(5) label2-label1     ?A#C.A ; is the distance different from the one calculated during compilation?     GOYES end ; if yes, we were moved by the GC *end     GOVLNG "PushT/FLoop"   ENDCODE   GARBAGE (I want to control precisely how much memory is released in the next garbage collection)   NULLHXS   MEM   3PICK OCRC DROP #- (subtract length of code object)   BINT76 #- (leave enough room for the memory-eating hxs's object header and stack level, one more hxs and its stack level, plus some slack space for TEMPOB per-object overhead for them and the code object, but not enough for another hxs)   EXPAND (does TOTEMPOB and in-place expansion)   HXS 10 F0E1D2C3B4A59687 TOTEMPOB (needs 10 nibbles for prolog and size field, 16 nibbles for data, 5 nibbles for stack level)   ROT TOTEMPOB (get the code object into TEMPOB after the hxs so the GC can affect it)   SWAPDROP (drop the hxs to let the GC move stuff after it like the code object into its space)   EVAL (run the ASM code)   ROTDROP (drop the memory-eater, leave the code-pushed hxs and flag for inspection) ;
The program has a spot where the amount of memory left can be adjusted: it currently says BINT76. Very low values cause an "Insufficient memory" error, of course. High values permit the HXS to be pushed without the GC needing to run, so the results become the HXS and FALSE.
The interesting part is the range between these two. BINT119 seems to be the lowest value without GC interference; BINT118 does something very different - but it's not a crash from the last part of the code getting moved and in its original spot overwritten by the pushed HXS. It's the GC in "I'm suffocating" panic mode! This pauses the program to ask the user a number of questions about what shall be deleted (stack, last stack, last commands, various global variables and libraries in port 0, etc.) Letting it delete something obviously gives it room to breathe; answering "no" to all of them eventually goes up a directory and starts over, until it reaches HOME - the libraries in port 0 are last. After that it seems to just kill the program.

So, conclusion: the GC pins objects referenced by the Saturn RSTK and doesn't even move them. If that leaves it unable to do anything (memory allocation only happens after the last object in TEMPOB), it panics. Thus memory allocation from code in TEMPOB is safe.
08-06-2020, 11:21 AM
Post: #22
 Werner Senior Member Posts: 696 Joined: Dec 2013
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
A couple of remarks..
The gc does not update the Saturn RSTK. So if one happens during the PUSHhxs call, and the code has moved, you will not return to 'GOSUB label2' at all. So you will never end up with a difference. You will just crash.
You have only tested the low memory conditions; you should drop the large hxs, not the small one, so that after gc, you don't end up in the 'panic mode'. Back up your 49G first.
There are tons of assembly programs out there that guard against this gc problem, either by allocating memory up front, performing a gc before starting the code objects, or dropping into Sysrpl to perform a gc when needed. And then there's the version I used for LSORT, an invention of my own.

Cheers, Werner
08-06-2020, 12:06 PM
Post: #23
 Werner Senior Member Posts: 696 Joined: Dec 2013
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
Here's my ASM version

Code:
::   CK2&Dispatch   REALREAL   ::     COERCE2     "ZZZZZ"     TOTEMPOB     UNROT     CODE                A: Y                                                    C: X       GOSBVL =POP2#     xxxx xxxx xxxx xxxx xxxx 0000 0000 CBA9 8765 4321       xxxx xxxx xxxx xxxx xxxx 0000 0000 CBA9 8765 4321   C=0.M C+C.A CSL.A                                                             0000 0000 0000 0000 0000 000C BA98 7654 3210 0000   A=C.M C=0.M C+C.A     0000 0000 0000 0000 0000 000C BA98                      0000 0000 0000 0000 0000 0000 0007 6543 2100 0000   CSL.A CSL.M CSL.M                                                             0000 0000 0000 0007 6543 0000 0000 2100 0000 0000   CSR.A CSR.A C+A.M                                                             0000 0000 0000 0007 6543 000C BA98 0000 0000 2100   CSL.M CSL.M                                                                   0000 0007 6543 000C BA98 0000 0000 0000 0000 2100   A=0.M ASL.A A+C.A     0000 0000 0000 0000 0000 0000 CBA9 8765 4321 xy00   A+A.A C=A.A A=0.M     0000 0000 0000 0000 0000 0000 0000 7654 321x y000                                000C BA98 7654 321x y000   CSR.A CSR.A CSR.A     0000 0000 0000 0000 0000 0000 0000 7654 321x y000                                0000 0000 0000 000C BA98   A+A.A ASL.A           0000 0000 0000 0000 0000 0007 6543 21xy 0000 0000                                0000 0000 0000 000C BA98   A+C.W ASL.M       LC 4020606020       P=9       A!C.WP       C=DAT1.A       CD1EX       D1+10       DAT1=A.WP       D1=C       P=0       GOVLNG =Loop     ENDCODE   ; ; @

Cheers, Werner
08-06-2020, 02:13 PM (This post was last modified: 08-06-2020 03:04 PM by 3298.)
Post: #24
 3298 Member Posts: 179 Joined: Oct 2014
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
(08-06-2020 11:21 AM)Werner Wrote:  The gc does not update the Saturn RSTK. So if one happens during the PUSHhxs call, and the code has moved, you will not return to 'GOSUB label2' at all. So you will never end up with a difference. You will just crash.
I wasn't quite sure whether it would adjust RSTK levels. But I specifically dropped the small HXS so the new one would overwrite part of the code (to trigger a visible crash). If I dropped the large one (>200KB in my test on a mostly empty x49gp instance), the GC might well copy the code object to a place so far away that the old copy (now in unallocated space) may survive without getting overwritten, permitting it to finish its job without getting disturbed.
Dropping the small one should free up just enough space for the new HXS, as they are the same size.

I retested it just now, and while I somehow cannot trigger panic mode anymore (I wonder what changed? This is on the same x49gp instance as before), it now works all the way down to BINT38 (while returning FALSE). BINT37 causes "Insufficient Memory". Just for you I also changed the code to drop the big HXS instead (ROTDROP instead of SWAPDROP) - no change: fails for BINT37, works with return value FALSE for BINT38. No crash whatsoever.
Then I changed it so another object is put into TEMPOP behind the code which remains there while it's running. The intention is to have something in addition to the code-pushed HXS that would overwrite the end of the code in its pre-move position. I made it larger than the dropped small HXS (32 instead of 16 nibbles of data) so the GC wouldn't try to be smart and swap it into the place of the dropped one. Of course, all that means the number has to be adjusted up. Finally, I stumbled into a warmstart at 150 - but weirdly, it just works at BINT131.

I don't know what's going on anymore. Garbage collector and/or memory allocator, are you drunk?

Oh well. Here's a version (of the Saturn ASM program that's actually for this topic) which sidesteps the issue by running pretty much the entire code from the 80100 scratch area. No GC interference there. I could've put only the part starting at the PUSHhxs call into that area (or even only the part after it if I load 80100 into RSTK manually and call PUSHhxs with a GOVLNG instead), but since the ARM example in the MASD docs copies 28 ARM instructions weighing 8 nibbles each into that place, it's probably large enough for 43 Saturn instructions of varying length if I counted correctly (pretty much all of them shorter, except for the LA/LC pair loading the AND/OR masks; in many cases significantly shorter).

Code:
::   CK2&Dispatch   REALREAL   ::     COERCE2     CODE       GOSUB end *begin       GOSBVL POP2# ; X in C.A, Y in A.A       R0=C.A ; SAVPTR uses C.A, so it has to be backed up       GOSBVL SAVPTR       C=R0.A       B=C.B ; B = X8.X7.X6.X5.X4.X3.X2.X1       BSL.X ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.0.0.0       BSL.A ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.0.0.0.0.0.0.0       BSL.A ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.0.0.0.0.0.0.0.0.0.0.0       B=C.X ; B = X8.X7.X6.X5.X4.X3.X2.X1.X12.X11.X10.X9.X8.X7.X6.X5.X4.X3.X2.X1       BSRB.X ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.X5.X4.X3.X2       P=6       BSL.WP ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.X5.X4.X3.X2.0.0.0.0       B=A.B ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.Y8.Y7.Y6.Y5.Y4.Y3.Y2.Y1       BSL.WP ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.Y8.Y7.Y6.Y5.Y4.Y3.Y2.Y1.0​​.0.0.0       P=0       B=A.P ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.Y8.Y7.Y6.Y5.Y4.Y3.Y2.Y1.Y​​4.Y3.Y2.Y1       B+B.W ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.Y8.Y7.Y6.Y5.Y4.Y3.Y2.Y1.Y​​4.Y3.Y2.Y1.0       B+B.W ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.Y8.Y7.Y6.Y5.Y4.Y3.Y2.Y1.Y​​4.Y3.Y2.Y1.0.0       CBIT=0.2 ; clear X3 so it doesn't interfere with Y1       CBIT=0.3 ; same for X4       B+C.P ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.Y8.Y7.Y6.Y5.Y4.Y3.Y2.Y1.Y​​4.Y3.Y2.Y1.X2.X1       BSL.W ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.Y8.Y7.Y6.Y5.Y4.Y3.Y2.Y1.Y​​4.Y3.Y2.Y1.X2.X1.0.0.0.0       BSL.W ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.Y8.Y7.Y6.Y5.Y4.Y3.Y2.Y1.Y​​4.Y3.Y2.Y1.X2.X1.0.0.0.0.0.0.0.0       ASR.X ; A.X = 0.0.0.0.Y12.Y11.Y10.Y9.Y8.Y7.Y6.Y5       A+A.X ; A.X = 0.0.0.Y12.Y11.Y10.Y9.Y8.Y7.Y6.Y5.0       ASR.X ; A.X = 0.0.0.0.0.0.0.Y12.Y11.Y10.Y9.Y8       B=A.B ; B = X8.X7.X6.X5.X4.X3.X2.X1.0.X12.X11.X10.X9.X8.X7.X6.Y8.Y7.Y6.Y5.Y4.Y3.Y2.Y1.Y​​4.Y3.Y2.Y1.X2.X1.0.0.0.Y12.Y11.Y10.Y9.Y8       LA 4020606020       LC 1F1F1F0F1F       P=9       B&C.WP ; B = X8.X7.X6.X5.X4.X3.0.0.0.X12.X11.X10.X9.X8.0.0.0.Y7.Y6.Y5.Y4.Y3.0.0.0.0.Y2.Y​​1.X2.X1.0.0.0.Y12.Y11.Y10.Y9.Y8       A!B.WP ; B = 0.1.0.X8.X7.X6.X5.X4.X3.0.1.0.X12.X11.X10.X9.X8.0.1.1.Y7.Y6.Y5.Y4.Y3.0.1.1.​​0.Y2.Y1.X2.X1.0.1.0.Y12.Y11.Y10.Y9.Y8       GOSBVL GETPTR       GOSBVL PUSHhxs ; the object is now laid out like this: (5) DOHXS (5) length=0000Fh (10) data ; data first byte: 0.1.0.Y12.Y11.Y10.Y9.Y8 ; data second byte: 0.1.1.0.Y2.Y1.X2.X1 ; data third byte: 0.1.1.Y7.Y6.Y5.Y4.Y3 ; data fourth byte: 0.1.0.X12.X11.X10.X9.X8 ; data fifth byte: 0.1.0.X8.X7.X6.X5.X4.X3       C=DAT1.A       CD1EX       LA(5) DOCSTR       DAT1=A.A ; overwrite the prolog so it turns from DOHXS to DOCSTR       D1=C       A=DAT0.A       D0+5       PC=(A) *end       C=RSTK       D0=C       D1=80100       LC(5) end-begin       GOSBVL MOVEDOWN       GOVLNG 80100     ENDCODE   ; ;
I should probably point out that the Saturn conversion is mostly academical at this point. The performance improvement is marginal compared to the CPU cycle budget of the calculation our parameters come out of. Just think of what it takes to perform a "simple" addition of two real numbers... and then putting the result of that onto the stack for us to consume, complete with memory allocation and its hazards. Still, it's fun to mess around with this.
08-06-2020, 05:15 PM (This post was last modified: 08-06-2020 05:36 PM by Werner.)
Post: #25
 Werner Senior Member Posts: 696 Joined: Dec 2013
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
Here's my latest concoction. Only 75 bytes.
Code:
::   CK2&Dispatch   REALREAL   ::     COERCE2     CODE                A: Y                                                    C: X       GOSBVL =POP2#     xxxx xxxx xxxx xxxx xxxx 0000 0000 CBA9 8765 4321       xxxx xxxx xxxx xxxx xxxx 0000 0000 CBA9 8765 4321       C+C.A CSL.A                                                               xxxx xxxx xxxx xxxx xxxx 000C BA98 7654 3210 0000       ACEX.X A+A.A      xxxx xxxx xxxx xxxx xxxx 0000 0007 6543 2100 0000       xxxx xxxx xxxx xxxx xxxx 000C BA98 CBA9 8765 4321       ASL.A ASL.M ASL.M xxxx xxxx xxxx 0007 6543 0000 0000 2100 0000 0000       ASR.A ASR.A       xxxx xxxx xxxx 0007 6543 0000 0000 0000 0000 2100       ACEX.A            xxxx xxxx xxxx 0007 6543 000C BA98 CBA9 8765 4321       xxxx xxxx xxxx xxxx xxxx 0000 0000 0000 0000 2100       ASL.M ASL.W A+C.A xxxx 0007 6543 000C BA98 0000 CBA9 8765 4321 2100       A+A.A C=A.X       xxxx 0007 6543 000C BA98 000C BA98 7654 3212 1000       xxxx xxxx xxxx xxxx xxxx 0000 0000 7654 3212 1000       ASR.A ASR.A ASR.A xxxx 0007 6543 000C BA98 0000 0000 0000 000C BA98       C+C.A CSL.A                                                               xxxx xxxx xxxx xxxx xxxx 0007 6543 2121 0000 0000       A+C.A ASL.M       0007 6543 000C BA98 0007 6543 0000 2121 000C BA98       LC 4020606020       P=9       A!C.WP       GOSBVL =SAVPTR       GOVLNG =PUSHhxsLoop     ENDCODE     # 02A2C     CHANGETYPE   ; ; @
08-07-2020, 06:07 AM
Post: #26
 cyrille de brébisson Senior Member Posts: 1,047 Joined: Dec 2013
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
Hello,

I am pretty sure that I made some changes to the garbage code for exactly this reason (check the ASM return stack). But I might never have checked it in...

I honestly do not remember. it was so long ago!

Cyrille

Although I work for the HP calculator group, the views and opinions I post here are my own. I do not speak for HP.
08-07-2020, 07:30 AM
Post: #27
 Werner Senior Member Posts: 696 Joined: Dec 2013
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
You're excused, Cyrille, for not living in the hp past (42, 48/49) like me ;-)
But if you did implement the return stack update, it was probably for the 50G, not the 49G.
I remember Jean-Yves rewriting the garbage collector for the 49, making it many times faster, and - I think, it's been so long ;-) - also solving the exponential gc running time when exploding large TEMPOB lists onto the stack. But I'm pretty sure he didn't touch the ASM return stack.
Cheers, Werner
08-07-2020, 01:13 PM
Post: #28
 Martin Hepperle Senior Member Posts: 330 Joined: May 2014
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
... in the meantime I have written my notes for my Tektronix emulation with the HP-49G example program. I have not yet tested the last Assembler version - the previous versions crashed in my setup, which ist more likely my fault than yours, Werner.

While I will use the terminal emulator with larger computers, like a 64 KByte CP/M system, it ma also be of interest to some of you. Just a simple 10\$ ESP32 board is able to generate the VGA signal and handle keyboard and mouse at the same time.
It is also a very simple means to present graphics from the HP 4x calculator on a larger screen or video projector.
08-07-2020, 01:27 PM
Post: #29
 Werner Senior Member Posts: 696 Joined: Dec 2013
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
To make sure you haven't typed in an error (easy enough to do with all the .A, .M and .X suffixes) verify the checksum of the routine (with BYTES): the last one above from me is 75.0 bytes, # DA24h (or # 55844d).
Cheers, Werner
08-08-2020, 09:37 AM (This post was last modified: 08-08-2020 09:38 AM by Martin Hepperle.)
Post: #30
 Martin Hepperle Senior Member Posts: 330 Joined: May 2014
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
Werner,

so far I was not able to use you assembler variants.
I copied and pasted the assembler source so there should be no typing errors.
The earlier versions simply crashed my HP 49G and the last version returns a string with the wrong content.
The only thing I changed was that I prepended my scaling/translation transformation in front of the COERCE2.
Maybe I am doing something wrong i building the code object.
I simply use the classical HP-Tools chain rplcomp, sasm, sload to build the TEKXY.hp object.
With a command file under Windows this is no problem and the SysRPL code works as it should.
Then I load the result in Emu48 for testing and, if satisfied, I transfer the result to the calculator.

Of course, I would like to test the assembler version too. Maybe you can push me into the right track.
- On which platform/emulator did you test your code?
- Do you obtain the same results as in the examples below?

Sorry for all these questions,
Martin

Stack (top ... bottom, all must be Reals, which is ensured by the calling RPL program)
6: x0
5: y0
4: sx
3: sy
2: x
1: y
TEKXY
...result: dropped x, y replaced by 5 character string...

Example 1:
0.
0.
2048.
2048.
1.
1.
TEKXY
"00@"

Example 2:
0.
0.
2048.
2048.
1.1
1.2
TEKXY
"3if1S"

Attached File(s) Thumbnail(s)

08-08-2020, 11:25 AM
Post: #31
 Werner Senior Member Posts: 696 Joined: Dec 2013
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
Hello Martin.
If you name my routine 'ATEK' then I obtain your results with
\<<
5. PICK - PICK3 * SWAP
6. PICK - 4. PICK *
ATEK
\>>

And I use a real 49G. Don't have anything else.
Werner
08-08-2020, 03:51 PM
Post: #32
 3298 Member Posts: 179 Joined: Oct 2014
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
Martin, I seem to remember that some Saturn assemblers disagree on the direction to read the LA and LC parameters in. Some say the least significant nibble should come first, others (including MASD, which I use) say the most significant comes first. Since Werner's loaded constant is identical to one of mine, I'd bet he's using MASD too. If your assembler uses the opposite order (one of your earlier posts in this topic has some HPHP49-C artifact which would indicate PC-based development), then try reversing the digits. Getting the digits backwards would definitely lead to a wrong string, as they contain the always-1 bits.

By the way, if you'd use MASD and the full suite of development tools people tend to recommend for it (extable of course, Emacs, SDIAG, perhaps Nosy for a little ROM snooping, ...) you would've seen 6PICK pop up in Emacs' autocompletion. (Just type 6P and ask for a completion; if 6PICK didn't exist it wouldn't complete. This "just try it" route is much faster than looking it up in the docs manually.)
08-09-2020, 10:47 AM
Post: #33
 Martin Hepperle Senior Member Posts: 330 Joined: May 2014
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
Hmmm ... MASD / HP syntax?

Ahh ... Eureka! ... with the HPTools I have to use "CODEM" instead of "CODE" to switch to MASD syntax to assemble your code.
Due to my sloppyness I did not see this mistake.
If I would have looked into the SASM output listing, I would have seen many
*** ERROR: Illegal mnemonic ***
messages in my CODE/ENDCODE block.
But foolishly I only looked into the listing of the RPLCOM preprocessor, which was without any error messages (only in case of typos I found my undefined externals).

Interestingly SLOAD creates an .hp file in both cases.

Now everything works as advertised. Most of the time is now spent in my surrounding frame, not in this calculation anymore.

Thank you for your guidance and patience,
Martin
08-11-2020, 10:22 AM (This post was last modified: 08-11-2020 10:50 AM by Martin Hepperle.)
Post: #34
 Martin Hepperle Senior Member Posts: 330 Joined: May 2014
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
To support my brain in these bit-twiddling exercises, I created the attached Saturn reference sheet. The vertical lines are intended to show the possible interactions between registers.
If you spot any mistakes I am happy to update the sheet.
[...attachment removed...see below]
08-11-2020, 10:34 AM
Post: #35
 Werner Senior Member Posts: 696 Joined: Dec 2013
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
Reg A cannot interact with reg D directly. Only C can interact with all.
Werner
08-11-2020, 10:51 AM
Post: #36
 Martin Hepperle Senior Member Posts: 330 Joined: May 2014
RE: Speed of HP-49G Routine for Tektronix Vector Graphics Terminal
(08-11-2020 10:34 AM)Werner Wrote:  Reg A cannot interact with reg D directly. Only C can interact with all.
Werner

- too bad, it was such a nice regular picture ;-)

[updated sheet attached.]

Attached File(s)