Threaded Mode | Linear Mode

3298 · 05-25-2023, 11:18 PM

Apologies for the delay, but writing a wall of text this large takes a while...

This series of posts is meant to grant a look inside my SysRPL submission, with an emphasis on optimization. Fundamentally, there are two categories of optimization tricks here: general SysRPL ones, and application-specific ones. The latter category obviously doesn't teach much for other projects, but since there are alternative implementations for this particular problem, they may have a use anyway, beyond the high-level lesson to explore your topic thoroughly for shortcuts.
I also expect that you have a copy of the proof at hand for the application-specific tricks, since they are kind of math-heavy and tie into the its arguments, use its definitions, reference its dispatching cases, and so on.

Okay, let's dive into the source code, file by file. The first one is dedicated to a general SysRPL trick, which I lifted from Nosy: building a library with MASD directives instead of the separate CRLIB command. I have an unfinished utility library to assist with this, so I'll reserve the details for when I add the missing bits of input validation and customization so I can release it. But as the Nosy readme indicates, it lets you embed the code of one library command inside another's (with the condition that the embedded one can't be one of the named commands, due to library structure limitations), thus eliminating the ROMPTR you'd otherwise use in the spot you embed the code. A ROMPTR obviously uses up some memory, but the more important consideration is that it takes a substantial amount of time on execution, so you generally replace the call site with an embed where you expect the most "traffic", so to speak.
A separate benefit from this trick is that you always compile all pieces of the project in one go. No more accidental mixing new versions of some commands with outdated versions of others, just because you forgot to recompile them before calling CRLIB. Not a classical optimization, but by preventing mistakes, I'll call it an optimization anyway: one for developer time.
Since the tool to auto-generate and compile the code tying all the individual commands together isn't published just yet, I've bundled a copy of that generated code (lightly edited, because otherwise the library checksum needs to be adjusted after compiling)
into the source. Filename: compile

Code:

!NO CODE

!ASM

STROBJ 02B40 {

$(2) (title_end_-title_)/2

*title_

¢Sum3Pal: split a number into a sum of 3 palindromes in base-n > 4¢

*title_end_

$(2) (title_end_-title_)/2

EQU romid_ #859

$(3) romid_

G5 _hash $(5) 0 G5 _link G5 _cfg

*_hash

STROBJ 02A4E {

  $(5) 0 $(5) 0 $(5) 0 $(5) 0 $(5) 0 $(5) 0 G5 _hash_7 $(5) 0 $(5) 0 $(5) 0 $(5) 0 $(5) 0 $(5) 0 $(5) 0 $(5) 0 $(5) 0

  G5 _hash_idx

  *_hash_7

    *name_Sum3Pal_ $(2) 7 ¢Sum3Pal¢ $(3) 0

  *_hash_idx

    $(5) &-name_Sum3Pal_

}

*_link

STROBJ 02A4E {

  G5 Sum3Pal.s

  G5 ck1z.s

  G5 ZINTtoDigits.s

  G5 DigitsToZINT.s

  G5 AlgSList.s

  G5 subAsZINTthenAlgS.s

  G5 AlgS5Callback.s

  G5 pickType.s

  G5 BINTmod.s

  G5 AlgMain.s

  G5 makeEvenLenPal.s

  G5 makeOddLenPal.s

  G5 AlgIIandIIIcarry.s

  G5 AlgIVfinish2and4ii.s

}

!RPL

DEFINE Sum3Pal DOROMP CON(3) romid_ CON(3) 0

DEFINE ck1z DOROMP CON(3) romid_ CON(3) 1

DEFINE ZINTtoDigits DOROMP CON(3) romid_ CON(3) 2

DEFINE DigitsToZINT DOROMP CON(3) romid_ CON(3) 3

DEFINE AlgSList DOROMP CON(3) romid_ CON(3) 4

DEFINE subAsZINTthenAlgS DOROMP CON(3) romid_ CON(3) 5

DEFINE AlgS5Callback DOROMP CON(3) romid_ CON(3) 6

DEFINE pickType DOROMP CON(3) romid_ CON(3) 7

DEFINE BINTmod DOROMP CON(3) romid_ CON(3) 8

DEFINE AlgMain DOROMP CON(3) romid_ CON(3) 9

DEFINE makeEvenLenPal DOROMP CON(3) romid_ CON(3) 10

DEFINE makeOddLenPal DOROMP CON(3) romid_ CON(3) 11

DEFINE AlgIIandIIIcarry DOROMP CON(3) romid_ CON(3) 12

DEFINE AlgIVfinish2and4ii DOROMP CON(3) romid_ CON(3) 13

LABEL _cfg

:: DOBINT romid_ XEQSETLIB ;

CON(1) 8 CON(3) romid_ CON(3) 0

INCLUDE Sum3Pal

!ASM

% checksum manually fixed, was: $(4) 0

$1A78

}

@

There's a group of files controlling the tool that generated the code above. If you're familiar with CRLIB, you'll surely recognize these filenames; emulating most of CRLIB makes the transition easier for programmers, but my tool won't be fully compatible. The files are included here for the sake of completeness. Filename: $TITLE

Code:

Sum3Pal: split a number into a sum of 3 palindromes in base-n > 4

Filename: $ROMID

Code:

859.

Filename: $VISIBLE

Code:

{ Sum3Pal }

Filename: $HIDDEN

Code:

{ ck1z ZINTtoDigits DigitsToZINT AlgSList subAsZINTthenAlgS AlgS5Callback pickType BINTmod AlgMain makeEvenLenPal makeOddLenPal AlgIIandIIIcarry AlgIVfinish2and4ii }

Next up: the actual entry point from UserRPL (i.e. user programs and the command line) into this library. Since I merely chose the library structure for easy subroutines accessible from multiple other subroutines tucked away deeply in the call graph, the library has only this one visible command.
Many SysRPL programs expected to be called from UserRPL use CKn&Dispatch (where n is a digit from 1 to 5) to simultaneously check count and type of their arguments, and additionally register the currently running ROMPTR as the command "blamed" in case of an error (you'll see the name in the error message window, in front of the word "Error:", when the message itself comes after). These are useful commands because they do several useful things for the cost of a single command and a handful of BINTs, particularly for "overloaded" commands like + which have multiple separate meanings depending on the types of their parameters (e.g. concatenate strings, concatenate lists, append objects to lists, add numbers).
There are alternatives, though. A group of CAS entry points named ^CK1Z, ^CK2Z, and ^CK3Z show the way: pick a type (ZINT for these), and attempt to convert whatever the user passed into that type. On failure, complain about "Bad Argument Type". There's just one shortcoming for my use-case: When dealing with numbers in non-decimal bases - one of the arguments taken by Sum3Pal sets the base it operates in - supporting the HXS type would be desirable, but ^CK1Z and friends only accept ZINTs, reals, and (for some weird reason) strings that can be parsed into ZINTs. As we discovered recently in another thread, there isn't even a direct way to convert a HXS to ZINT without writing the ASM code for it yourself. My solution is a wrapper around ^CK1Z which converts a HXS to one of the other accepted types: since reals are potentially lossy, I went with strings as the intermediate format in this indirect conversion. Unlike Gerald's solution in the linked thread, mine simply decompiles the HXS with a word length of 64 in decimal mode and chops off the leading "# ". This fails for HXS longer than 64 bits, but the rest of the system doesn't really support them either, so I find that an acceptable sacrifice.
In case you're wondering about the label at the start of each source file, the library-in-MASD technique depends on these to calculate the offsets from the fixed parts of the library structure to the corresponding code. Having to maintain these is not ideal, but unavoidable - luckily, when I omit or forget to change the label in a renamed file, I get a fairly informative MASD error, so it's not a big problem. They are always in the format <filename>.s, so I'll omit the filename for the code blocks representing source files.
The real fun begins inside fromZINT obviously, but the arguments aren't just type-checked here - their order on the stack changes, and I squeezed some code in here which makes a copy of the base and converts it to a BINT (which we'll need later). The method to arrive at such an optimization is merely a second pass over the code - I had the conversion inside fromZINT previously, where I PICKed the number in ZINT format, converted it, and UNROLLed the result to its proper position (there was some more reorganization in there afterwards, but you get the point), and the obvious question was: is there a spot where the ZINT is conveniently located in stack level 1, and is there something to be gained from moving the conversion to that place? The answer was yes to both, and move it I did.
That conversion to BINT also checks its value range. The errors aren't quite what I'd prefer (that would be "Bad Argument Value" which would be in line with other non-CAS commands, instead we get "Negative integer" or "Integer too large"), but oh well, I'll take the free range checks. The upper limit of 9999 even gives me a good amount of headroom so I can do some arithmetic up to a few multiples of the given base without running into integer overflow problems.

Code:

LABEL Sum3Pal.s

::

  CK2

  ck1z

  DUP FPTR2 ^Z># ROT

  INCLUDE ck1z

  INCLUDE fromZINT

;

@

Code:

LABEL ck1z.s

::

  DUPTYPEHSTR?

  IT

  ::

    RunSafeFlags

    ::

      DODEC BINT64 dostws

      hxs>$

    ;

    BINT3 LAST$

  ;

  FPTR2 ^CK1Z

;

@

Before we get to fromZINT though, let's get a pair of helper functions out of the way. Inside the program, I'm representing the input number and the palindromes as metas (i.e. exploded lists) of digits in BINT form. These helper functions convert between that representation and ZINT. ZINTtoDigits expects the base on stack level 2 and leaves only the meta, DigitsToZINT wants the base to be in BINT form in the first NULLLAM and gives it back in ZINT form alongside the resulting ZINT. That difference comes from the places these are used and what is more efficient in those places.
Other than sacrificing pointless consistency in the call interface of subroutines intended for internal use only, the only lesson to be learned from these is to try several different orders of arguments to see which is the best one - in this case, ZINTtoDigits benefits from keeping the base in level 2 and the number to be converted in level 1, because even though ^Mod needs them the other way around (it needs to work on copies anyway, so we can fix the order for free by combining the 2DUP creating those copies with a SWAP), updating the remaining ZINT in each loop iteration (lopping off the least significant digit using ^Div) is easier with it in level 1. The base has to be kept around unmodified until the end, so we can just PICK it each time, and that needs one command regardless of level (as long as it's fixed and at most 10). Since there aren't more temporary items deposited on the stack at the time it's needed, we need 2 PICK, so let's recall the awesome signature of fellow forum member HP67: It ain't OVER 'till it's 2 PICK (yes, I had to incorporate that one somewhere Big Grin

) and turn it into OVER.

Code:

LABEL ZINTtoDigits.s

::

  BINT0 UNROT

  BEGIN

    2DUPSWAP FPTR2 ^Mod

    DUP FPTR2 ^Z># 5UNROLL

    FPTR2 ^QSub OVER FPTR2 ^QDiv

    ROT#1+UNROT

    FPTR2 ^DupQIsZero?

  UNTIL

  2DROP

;

@

Code:

LABEL DigitsToZINT.s

::

  1GETLAM FPTR2 ^#>Z Z0_ ROT

  ZERO_DO

    OVER FPTR2 ^QMul

    ROT FPTR2 ^#>Z FPTR2 ^QAdd

  LOOP

;

@

In fromZINT I'm setting up an environment consisting of a set of NULLLAMs and a virtual stack. Since I will be constructing the palindromes on the stack, keeping all the variables used in the process in the stack would make accessing them quite painful, so that environment is definitely needed. The virtual stack is pretty much the perfect place to store a number of values whose count is not known but won't change at runtime. Addressing by index from both ends is an additional perk that's very handy for this particular challenge. The rest goes into NULLLAMs. I reserved the second NULLLAM for a subroutine though, because 2GETEVAL is just better than a ROMPTR. All the other subroutines stay ROMPTRs though - nGETLAM EVAL is still faster and slightly smaller than a ROMPTR, but when wrapped into a secondary in the places where a singular object or command is needed, it loses the size competition by a substantial margin.
The comments describing all the pieces of the environment may be somewhat cryptic without the context they get used in. I wrote them primarily as a reference for myself to tell which one I need to use when writing the rest of the code. LAM1 is clearly the base given by the user (the only non-obvious aspect is that it's a BINT), LAM2 is half-explained above (I'll explain the subroutine's code separately below), and for the rest you'll have to search for their context.

There's also some error checking in here. In retrospect, I should have moved it up into the Sum3Pal source file with the rest of the error checks, but in terms of optimization it doesn't make a difference. I'm not changing it now, you get the code as submitted.
Much of the work to solve the challenge is hidden in the lines from the AlgSList call to the pickType call. Those lines use the digit count to extract the fitting subroutine from a list, and if it exists it's EVALed, else you get pickType as a fallback. This is a fairly decent rendition of a C-style switch() statement; if the non-default cases weren't consecutive, something based around Lookup instead of NTHELCOMP would be needed, but the benefits of omitting BINTnn #=casedrop (2 commands per case if referencing a builtin BINT) are reduced because you would need the BINTnn back.
Those lines receive the input digits meta as argument and are supposed to give back three lists. Around them, these arguments are converted from and to ZINTs. I use lists instead of metas here because as single objects they are easier to juggle around, and there's another few places where they need to be juggled, most of them connected to subAsZINTthenAlgS.

Code:

LABEL fromZINT.s

  FPTR2 ^DupZIsNeg? caseSIZEERR

  SWAPDUP BINT5 #< caseSIZEERR

  ZEROZEROZERO ZEROZEROONE

  ' INCLUDE computeCarryAndRem

DEFINE computeCarryAndRem 2GETEVAL

  8ROLL

  ' NULLLAM BINT8 NDUPN DOBIND

(LAM1=base)

(LAM2=helper for calculating)

(in mod base)

(LAM3=VStack index from bottom)

(index from top is INDEX@)

(LAM4=length of n)

(LAM5=index adjustment inside)

(main algorithm: 0 for type A,)

(1 for type B)

(also used to avoid switching)

(type B to A inside AlgV)

(LAM6=offset applied to)

(middle digits by AlgV)

(LAM7=main algorithm loop)

(iterations count)

(LAM8=callback for main)

(algorithm finish)

  INCLUDE ZINTtoDigits

  DUP 4PUTLAM

  PushMetaVStack

(VStack=digits of n)

(Bottom=most significant)

(Top=least significant)

(Stack=user-owned stack above)

(a copy of the VStack meta)

  INCLUDE AlgSList

  OVER NTHELCOMP

  ITE EVAL INCLUDE pickType

  BINT3 ZERO_DO

    ROT INNERCOMP

    INCLUDE DigitsToZINT

    SWAPDROP

  LOOP

  DropVStack ABND

@

AlgSList is, as mentioned, a list of subroutines. Specifically, it's a list of the subroutines constructing the palindromes for each of the "short" or "small" (take your pick, either of them works as an expansion of "S") input possibilities. It's a separate file because some of the higher-order ones need to refer back to their lower-order siblings. In the proof you can recognize these as the places where it says (for some palindromic template in place of $x$) "Then $n - x$ is the sum of two palindromes".
Calling one of these subroutines from another in these places is a no-brainer - but it has some implications for how I can return results with less than the full three palindromes. Just omitting the unneeded ones is a cardinal sin, of course (it doesn't give any way to tell at what stack level the non-palindrome items start). The actually viable options are a list or meta to bundle them up, or placeholder items for the unneeded ones. I found it to be more efficient if I use the latter option, with the appropriate representation of a zero as the placeholder. The otherwise unremarkable subroutine AlgS1 for 1-digit numbers showcases this: convert the meta on the stack (consisting of the digit and BINT1 as the digit count) into a list, and pad it with two lists containing one BINT0 each. Early on I determined that I wanted a valid digit in the placeholder, not just an empty list. That was before I switched the user-facing output format from digit lists to ZINTs, so an empty list would have been a user-visible inconsistency: AlgS1 (and some others) can produce a list containing a single BINT0 as one of their primary outputs too, alongside placeholders. The later switch to ZINT output format can hide this inconsistency, but I left this small optimization potential behind in case there's a need to hand the digit-list format back to the user again. Utilizing this potential would also have ripple effects throughout the higher-order "short" / "small" subroutines (because they usually pick apart the result of the call to lower-order ones to check for a third palindrome instead of checking all the conditions beforehand; I'll get back to that below), so it's a fairly invasive change with questionable benefits.

Code:

LABEL AlgSList.s

{

  INCLUDE AlgS1

  INCLUDE AlgS2

  INCLUDE AlgS3

  INCLUDE AlgS4

  INCLUDE AlgS5

  INCLUDE AlgS6

}

@

Code:

LABEL AlgS1.s

::

  P{}N

  BINT0 ONE{}N DUP

;

@

The 2-digit subroutine is less trivial, but still manageable. The proof lists three distinct cases; technically there should be a fourth to handle input $10$, but apparently the authors consider that piece of the puzzle so trivial it doesn't need to be mentioned at all (much like the whole 1-digit subroutine, actually). I'm lumping that case with the other instances of $(\delta+1)\delta$ initially. Those are handled in the nested secondary spanning multiple lines of source; the case $10$ gets some extra handling at the end of that secondary, after the caseDROP causes the other instances to drop out. That special handling kicks out the { BINT0 BINT0 } palindrome from level 3 and appends a placeholder instead; this is done to avoid the non-canonical representation of a 0 (again, in case the lists are handed to the user), and also to keep the palindromes sorted by magnitude. That's not only a more pretty result for the user, but it also simplifies processing the results in the higher-order subroutines (especially with a placeholder as 0-valued palindrome ending up in level 1 if it exists).
The remaining two cases share much of their code, but $\delta_1>\delta_0+1$ needs some massaging to get the values in line with the other case's expectations. Comparing the cases in the proof will easily reveal that the compensation needs to replace $\delta_1$ with $\delta_1-1$ and $\delta_0$ with $g+\delta_0$, so that's what the other nested secondary does. (Remember that the base we operate in, which the proof calls $g$, is stored in LAM1.) Sharing code between similar cases with an IT or ITE to take care of the differences will be a common sight in the other subroutines, better get used to it now; we'll also see some more messy instances of code sharing, including some runstream and return stack manipulation.

Code:

LABEL AlgS2.s

::

  NDUP #1- #=case

  ::

    #1-DUP TWO{}N

    1GETLAM #1- ONE{}N

    BINT1 ONE{}N

    4ROLL DUP#0<> caseDROP

    ONE{}N 4ROLLDROP

  ;

  2DUP#< IT

  :: #1-SWAP 1GETLAM #+SWAP ;

  SWAPOVER#-

  SWAPDUP TWO{}N

  SWAP ONE{}N

  BINT0 ONE{}N

;

@

With 3 digits we have six cases already. The case $\delta_2\le\delta_0$ is handled all the way at the end, and from inside the long nested secondary entered with IT the other two similar cases use its code as well (NOTcasedrop and ?SEMI are used to jump out to the shared code). Then, in a first relatively benign example for return stack fiddling, the RDROP (combined with a DROP, but looking for such combinations is just basic SysRPL optimization) removes that shared code from the return stack so the remaining three cases can do their job unimpeded. The #>2case with the following inner nested secondary sorts out one of them; the other two ($201$ and $100$) have some similarity again, so they are handled together at first, then the difference is applied: a placeholder palindrome for the $100$ case and a $101$ as largest palindrome otherwise.
Note also that the $g-1$ digit needed for the palindromes in four cases gets held onto throughout a series of checks. That's some minor recycling; there are numerous opportunities for that throughout this challenge. In several places it saves me the disposal of a "superfluous" item on a stack as well as putting the required item back on the stack where I need it, and all I need to do for it is some extra stackrobatics, alongside a proof that it's guaranteed to have the value I need. (That's easy here since I generate it from scratch; in other places I recycle leftover digits, and I need to ensure there isn't a corner case where they deviate from the value normally seen.)

Code:

LABEL AlgS3.s

::

  ROLL 2DUP#> IT

  ::

    1GETLAM OVER#+ 4ROLLROT

    OVER#0= NOTcasedrop #1-UNROT

    SWAPDROP 1GETLAM #1-

    4UNROLLROT #1- DUP4UNROLL

    #<> ?SEMI

    DROPRDROP

    DUP#1+ #>2case

    ::

      #1- SWAPOVER THREE{}N

      ONEONE BINT1 THREE{}N

      BINT0 ONE{}N

    ;

    SWAPDUP TWO{}N

    BINT1 ONE{}N

    ROT DUP#0=case ONE{}N

    ZEROOVER THREE{}N UNROT

  ;

  OVER#- ONE{}N UNROT

  DUPUNROT THREE{}N SWAP

  BINT0 ONE{}N

;

@

Before I present the next of the short / small number subroutines, let's first have a look at a helper it uses. It's the weirdly named subAsZINTthenAlgS I mentioned a few times already; the name is an attempt to succinctly describe what it does, but clearly I'm not very good at naming things.
Its job revolves around handling those cases where one of the AlgS subroutines calls another to build up part of its result, but that description doesn't tell you much about what its arguments or results look like. As a compromise, the name describes how it achieves its objective: taking a number in meta form on the stack (which is generally a snippet of the input number) and another number as a single BINT, which is produced by a callback taken from the runstream, it converts both to ZINTs, calculates the difference, converts that back to a meta representing the digits, and finally pulls the appropriate subroutine out of AlgSList and EVALs it. (Using COLA_EVAL instead of plain EVAL is just a matter of reducing the return stack depth while the subroutine is running - a free micro-optimization. In recursive programs this could turn into an important piece of the puzzle though: COLA implements what functional programming languages call "tail recursion".)
During the callback, the stack contains just the ZINT equivalent of the meta on level 1 and the ZINT form of $g$ on level 2. (By the way, do you see how $g$ is passed through from DigitsToZINT to ZINTtoDigits? That's not coincidence, it's iterative optimization applied to those two helpers to make it happen!) Beyond that, the stack contents are whatever the caller had on the stack above the meta. In many places calling subAsZINTthenAlgS, the callback makes use of that stack layout: for instance, a simple ROT as callback pulls down the item that was just above the meta and uses it as the number to subtract from what the meta represents.
Note also that the round-trip to ZINT and back not only tolerates leading zeroes, it even strips them out. That's how the lookup in AlgSList can yield a subroutine corresponding to a number length significantly shorter than the length of the meta. Thus, there's no danger of accidentally calling e.g. AlgS3 on a 1-digit number with two leading zeroes (which its algorithm isn't prepared to handle); in this example there will be a 1-element meta after ZINTtoDigits is finished, and AlgS1 will be called instead.
You may remember that the virtual stack is supposed to contain the digits of the number under examination. This isn't quite true during the subroutine call here (it contains the original number's digits, not the inner one's), but it doesn't matter because none of the AlgS subroutines touch the virtual stack. There's simply no need to correct its contents and restore it to the original afterwards for the outer number (or more efficiently, push another virtual stack and drop it afterwards). In case you're wondering why I ignore the virtual stack: inside each of the AlgS subroutines, the digit count is a fixed number (and a small one to boot), so it's more efficient to juggle the digits on the normal stack than picking them from the virtual stack could ever be.

Callbacks are a powerful pattern for optimization via code sharing, regardless of programming language: Since you're getting to execute arbitrary code, you can do basically anything inside the callback. Additionally RPL makes it easy to use this pattern, since it breaks down the barriers between code and data - more than most modern languages do, even the functional ones (except maybe its ancestor LISP), since you can just treat a list as a program and vice versa. SysRPL amplifies this power even further with runstream and return stack manipulation: you can fix up the shared code from inside the callback. AlgS5Callback has an example for that.

Code:

LABEL subAsZINTthenAlgS.s

::

  DigitsToZINT

  'REVAL FPTR2 ^#>Z FPTR2 ^QSub

  ZINTtoDigits

  AlgSList OVER NTHCOMPDROP

  COLA_EVAL

;

@

From AlgS4 on the proof labels the cases (and usually sub-cases too), which is very handy because it means I can refer to the cases with source comments. Such comments are short enough to not be in the way when reading the source on the small screen of a 50g, but combined with the proof they do a pretty decent job at explaining what each piece of the code is responsible for. That's just the amount of inline comments I want.
In 4-digit numbers referring back to the smaller cases via subAsZINTthenAlgS plays a very central role. Case i is the one actually doing that, taking the most significant digit $\delta_3$ and attempting to split off $\delta_{3}00\delta_3$ as the biggest palindrome. The other cases handle various failures in doing that: iv and v come into play when subtracting $\delta_{3}00\delta_3$ would go negative, so I sort them out before the subtraction. ii and iii are the ones where the difference breaks down into three palindromes instead of the usual two (or rarely, just one). To find these I just optimistically break it down and check if the third one (here's where keeping them sorted by magnitude shines) is a placeholder. If yes, leave it on the cutting room floor and get out of there.
The meta fed into subAsZINTthenAlgS contains the three digits $\delta_2\delta_1\delta_0$, and the callback pulls $\delta_3$ down. That's about as straightforward as a sample use of subAsZINTthenAlgS can get.
Distinguishing ii and iii is then done by taking a hard look at the proof's figures for the 3-digit case $201$ and the 2-digit case $(\delta+1)\delta$ and picking a difference between them which can be efficiently tested for. The third palindrome has been used up to check if it was a placeholder (and its value was $1$ either way), leaving $\delta\delta$ and $(g-1)$ for ii, or $101$ and $(g-1)(g-1)$ for iii. I settled for checking the bigger palindrome's second digit ($0$ for ii, $\delta\ge 1$ for iii), but reading the code again just now I see a better alternative: instead of blindly extracting only the first digit of the smaller palindrome ($g-1$ for both, which I can recycle - by extracting it before the decision the extraction becomes shared code), I could convert that smaller palindrome into its meta form, use its length for the decision, and DROP the extra $g-1$ at the start of case ii. That can be done by replacing CARCOMP OVER TWONTHCOMPDROP_ #0=case with INNER#1= NOTcasedrop (saving two commands, i.e. 5 bytes). You see, at this level of optimization even someone with a good amount of practice can miss a trick. It's not a trivial business, after all.

ii has three unnamed sub-cases, two of which are eerily similar (those for $\delta_3=1$ and $\delta_3=g-1$). Using shared code for these is an obvious move - and mathematically I could apply the same template to the third sub-case as well, but I've set a limitation for myself: no deviation from the proof's results, even if it makes the code better.
iii is divided into named sub-cases iii.a and iii.b, and the former in turn has two unnamed sub-cases. The first of those is basically identical to iii.b (apart from the exact digits the input number had, which are irrelevant at this point), so I want to get these handled by the same code. As it turns out, the test that separates the sub-cases of iii.a also works correctly for iii.b because with $1\le\delta\le g-2$ and $\delta_3=1$ (the condition for the other sub-case of iii.a), the condition for iii.b could never be satisfied: $\delta_3+\delta\le 1+(g-2)=g-1$ is clearly less than $g$, therefore it can't be $g+\delta_0$ for a valid (i.e. non-negative) digit $\delta_0$. That is to say, testing for the condition distinguishing the sub-cases of iii.a is sufficient. The absence of code comments mentioning iii.a and iii.b is therefore explained by them using the exact same code, which I simply gave the label representing both together, iii.
One of these unnamed sub-cases recycles an entire palindrome at once, $\delta\delta$. Assuming you can call that "recycling", that is. You could also call it "fixing up the other palindromes around it".

Code:

LABEL AlgS4.s

::

  PICK OVER#<

  2OVER #+ #0=

  ANDcase

  ::

(iv and v)

    UNROT2DROP #1-

    1GETLAM #1-SWAP

    DUP#0=csedrp

    ::

(v)

      DUPDUP THREE{}N

      BINT1 ONE{}N

      ROT ONE{}N

    ;

(iv)

    2DUPSWAP 2DUPSWAP BINT4 P{}N

    4UNROLL #- #+ ONE{}N

    BINT1 ONE{}N

  ;

(i,ii,and iii)

  DUPZERO 2DUPSWAP BINT4 P{}N

  5UNROLL 4UNROLL

  BINT3 subAsZINTthenAlgS

  ROT

  CARCOMP #0=?SEMI

(ii and iii)

  CARCOMP

  OVER TWONTHCOMPDROP_ #0=case

  ::

(ii)

    SWAPDROPSWAP CARCOMP

    2DUP#= OVER #1= ORcase

    ::

      ONEONE 3PICK BINT4 P{}N

      SWAP#1- DUP TWO{}N

      BINT3 ONE{}N

    ;

    #1-SWAP 2DUPSWAP BINT4 P{}N

    #TWO#ONE BINT2 THREE{}N

    BINT0 ONE{}N

  ;

(iii)

  ROT CARCOMP

  DUP#1= casedrop

  ::

    DUPDUP THREE{}N

    SWAP CARCOMP #1+DUP TWO{}N

    BINT1 ONE{}N

  ;

  #1-SWAP #1-

  2DUPSWAP BINT4 P{}N

  BINT1 BINT3 BINT1 THREE{}N

  ROT

;

@

AlgS5 refers back to the smaller cases just like AlgS4, and of course it does so via subAsZINTthenAlgS again. However, this time the largest palindrome has two non-zero digits at its ends instead of just one. That means the callback needs to construct a two-digit number, so I would have to artificially limit the base to about $\sqrt{2^20}=2^10$ to avoid integer overflow issues (it's a little higher due to the exact digits that go into the number), a significant step down from the 9999 limit the program actually uses. Since I wasn't willing to admit defeat over the corner case (relative to the entire program) of 5-digit numbers, I conjured up a solution: constructing that two-digit number as ZINT instead.
That creates other problems though: subAsZINTthenAlgS wants to convert the callback's result from BINT to ZINT, and the more significant digit is also supplied as a BINT on the stack. I circumvented both at the same time with a trick that's either very elegant or a massive hack, depending on viewpoint: knowing the exact code of subAsZINTthenAlgS, I see that the command to convert from BINT to ZINT will be the first element in the runstream belonging to the top entry of the return stack while the callback runs. I can consume that and simultaneously put it to good use on that single digit with 'REVAL. The 4PICK before it fetches the digit to be converted, and the rest of the callback appends the other digit, which is always $1$.
Such return stack manipulation is quite mind-bending, and it can easily become a maintenance nightmare. I spent a significant fraction of this callback's source on a comment documenting what is going on, just to avoid that outcome. And yes, the comment contains a typo (capitalization of subAsZINTthenAlgS). Deal with it.
The callback has its own source file instead of residing inline in the parent subroutine because it's used in two separate places in AlgS5. (Is code reuse getting out of hand when not just the callback-using functions, but even the callbacks passed to them are shared? I think not, it's just comprehensive optimization.)

Code:

LABEL AlgS5Callback.s

::

  4PICK 'REVAL

  (the previous command uses up)

  (the ^#>Z following the 'REVAL)

  (inside subAsZINTThenAlgS)

  3PICK FPTR2 ^QMul

  Z1_ FPTR2 ^QAdd

;

@

For 5-digit numbers, the first order of business is to redirect all input numbers starting with a digit other than $1$ into the code nominally responsible for numbers with 7 or more digits. The proof claims that there's no problem with that, and for version 1 that's true, but in version 2 it isn't. As we found out a few pages back in this thread, you can see the problem in decimal base with input 92805; the easy remedy is to use version 1 of the proof for Algorithm I and version 2 for the rest, because a whole bunch of holes in the logic were fixed between versions. The changes applied to Algorithm I actually hold up for 7 or more digits, so a hypothetical version 3 of the proof could present the reverse-adjustments as a 5-digit-specific addendum, like the addendum to Algorithm II in 6-digit numbers. In my implementation the addenda are weaved into the main algorithm, not into these subroutines for short / small numbers, so I'll get back to them in that context.
That is the only notable hole left in version 2. Apart from a cluster of typos in cases v.d and v.e of 6-digit numbers starting with $1$ (which avoids invalid output by pure luck; I'll get back to that below) all the other problems are quite minor (like a handful of wrong input digits listed in some figures, missing explicit definitions for some variables used, some phrasing in Algorithm V that fails to reflect changes done between versions 1 and 2, ...) and can generally be resolved by any reader who notices them with relatively little effort.

The case structure of the proof's handling of 5-digit numbers is built around two attempts to split off a 5-digit palindrome and each time hoping that the rest can be represented by two palindromes. Since all 5-digit numbers starting with anything other than $1$ are handed over to Algorithm I, the outermost digits of the 5-digit palindrome have to be $1$ as well. That leaves the second and fourth digit (which must have the same value to make a valid palindrome), and the central third digit, i.e. two independent digits. The various cases result in a variety of combinations for them with little in common between them; I managed to share the code snippet building the 5-digit palindrome anyway by constructing it last, outside the cases, from the two parameter digits.
Therefore the cases are dispatched inside a long embedded secondary. Exiting it means jumping to the snippet which builds the 5-digit palindrome; the two "failure" cases (iv and v) which don't use a 5-digit palindrome remove this snippet from the return stack with RDROP. (These also have more in common than just that, but the code to construct their common pieces is so short and their positions in the branching structure of the code so awkward that I couldn't find an actually worthwhile way to make them share code. You can take this as a reminder that you need to evaluate whether an "optimization" is an improvement or actually detrimental.)

The first attempt is with the palindrome $1\delta_{3}0\delta_{3}1$, where i poses as the "success" branch, ii, iii, and iv take care of the 3-palindrome remainders, and v, vi, and vii treat the would-be negative remainders. Most of the latter move on to the second attempt: $1(\delta_3-1)(g-1)(delta_3-1)1$. v is the one number where constructing that second attempt's palindrome isn't possible due to an invalid digit ($delta_3-1$ would be negative). In this second attempt, vi is the "success" branch, and vii takes care of the few possible failures: the subtraction can't produce a negative number this time, and the 3-palindrome remainders are limited to the 2-digit template $(\delta+1)\delta$.
Just like in AlgS4, both attempts at using subAsZINTthenAlgS are optimistic, as in: "failure" branches covering a 3-palindrome remainder are sorted out afterwards. For the first attempt (the second attempt doesn't need to handle this), telling 3-palindrome remainder $201$ (case ii) apart from the others can again be optimized further: replace UNROTOVER TWONTHCOMPDROP_ #0=case with UNROT INNER#1= NOTcasedrop and delete the nearest CARCOMP after that (i.e. the one between ii and iv) to save one command.
Case iii has sub-cases iii.a and iii.b, but they merely differ in their input digits, which are irrelevant as far as my code is concerned.

Code:

LABEL AlgS5.s

::

  OVER #1= NOTcase pickType

  2DROP

  ::

    3PICK 5PICK #0=?SKIP #1+

    OVER#> 3PICK #0<> ORcase

    ::

(i,ii,iii,and iv)

      4UNROLL BINT0 4UNROLL

      BINT3 INCLUDE subAsZINTthenAlgS

      INCLUDE AlgS5Callback

      CARCOMP #0=?SEMI

(ii,iii,and iv)

      ROTDROP BINT1 UNROTOVER

      TWONTHCOMPDROP_ #0=case

      ::

(ii)

        DROPZERO ONE{}N

      ;

      CARCOMP 4PICK #0=case

      ::

(iv)

        RDROP 2SWAP 2DROP

        BINT4 NDUPN P{}N SWAP

        CARCOMP #1+DUP TWO{}N

        BINT1 ONE{}N

      ;

(iii)

      SWAP4ROLL #1- 4UNROLL

      CARCOMP #1+DUP ONE{}N

      UNROTOVER THREE{}N SWAP

    ;

(v,vi,and vii)

    1GETLAM #1- 5UNROLL

    DUP#0=case

    ::

(v)

      4DROP RDROP

      BINT4 NDUPN P{}N

      BINT1 ONE{}N

      BINT0 ONE{}N

    ;

(vi and vii)

    #1- 5UNROLL DROPONE

    BINT3 subAsZINTthenAlgS

    AlgS5Callback

    CARCOMP #0=?SEMI

(vii)

    DROPSWAP #1-SWAP

    INCOMPDROP #1+

    ONEONE ROTSWAP THREE{}N

    SWAP#1- ONE{}N

  ;

  2SWAP OVER ONEONE 5UNROLL

  BINT5 P{}N UNROT

;

@