Threaded Mode | Linear Mode

3298 · 05-25-2023, 11:19 PM

For the final one among the subroutines for "short" / "small" numbers I'm using a pair of small helpers. They are principally built for the main algorithms, so you'll see them again there, but as they are generic enough, I was able to use them for AlgS6.
BINTmod is mostly a wrapper around the builtin command #/ for the places where only the remainder of a division by \(g\) is needed. As such, it is roughly equivalent to the \(D(n)\) notation in the proof, except I try optimize it out.
As part of the wrapper, \(g\) is recalled from its LAM1, and there's an additional quirk: before dividing, I add \(2\cdot g\) onto the dividend. This seems useless, especially because the quotient is DROPped at the end. But it has a purpose: the parameter can go a bit below \(0\) without unsigned division ruining the day. Placing the remedy into this shared helper may hurt performance a little bit in cases where it's not needed but in the limited memory of all RPL machines code size is generally more important.
computeCarryAndRem performs a somewhat more complex operation. It takes two parameters from the stack and adds them together (because it generally operates on a sum, so I'm sharing the #+ by shoving it in here). This sum is the sum of two palindromes' digits which line up under each other, plus (if applicable) any incoming carry. Then it uses a callback to fetch the digit of the input number which they and the third palindrome's corresponding digit have to add up to, and calculates that missing digit. As a highly useful byproduct, the outgoing carry is generated as well.
This helper is kind of optimistic: most of its code is built as if determining the missing digit is a matter of simply subtracting the input digit from the remainder of the division that splits off the outgoing carry. However, just before the subtraction happens, it examines if that would result in a negative number; if so, carry and remainder are adjusted accordingly.
As mentioned in fromZINT, computeCarryAndRem is stored in LAM2 instead of being available via ROMPTR. This has a notable side effect: 2GETEVAL is implemented in ROM as :: 2GETLAM EVAL ; - note the EVAL where I would like to see a COLA_EVAL. Since computeCarryAndRem takes a callback from the caller's runstream, I need to RDROP the extra return stack level pointing to right behind the EVAL inside 2GETEVAL. Then I can reach the runstream which called 2GETEVAL, and grab the callback from there. If computeCarryAndRem wasn't called via 2GETEVAL everywhere, this RDROP would likely do unwanted things. I felt the need to place one of my rare code comments here so I wouldn't forget about it in case I gave the LAM2 slot to another helper.

Code:

LABEL BINTmod.s

::

  1GETLAM #2* #+

  1GETLAM #/ DROP

;

@

Code:

LABEL computeCarryAndRem.s

::

  #+ 1GETLAM #/ SWAP

(get rid of the extra return)

(stack level inside 2GETEVAL)

  RDROP

  'REVAL 2DUP#> NOTcase SWAP#-

  ROT#1+UNROT 1GETLAM #+ SWAP#-

;

@

AlgS6 is the big bad one. It starts out nicely, with a redirection towards the long-number code if the first digit isn't \(1\) (just like AlgS5; the addendum to Algorithm II is baked into that code, so we'll see it when we get there). But the devil is in the details, as the template of referring back via subAsZINTthenAlgS is reduced to a much smaller role than it had in AlgS4 and AlgS5. In fact, there is not one but two additional templates. One of them uses a pair of 5-digit palindromes and a third shorter one - this template does the heavy lifting. The remaining one is responsible for cases iv.b, v.d, and v.e, about as many different input numbers as the subAsZINTthenAlgS template (with cases v.a, v.b, and v.c), but it is much harder to comprehend (in my opinion anyway), and therefore harder to construct working code for, let alone optimized code.
To those who followed the thread throughout the challenge, the template with the pair of 5-digit palindromes may seem somewhat familiar as Albert Chan's sketches are also based on a pair of odd-length palindromes accompanied by a shorter one. The proof's authors apparently found a number of edge cases which require adjustments or other templates altogether; I can't tell whether that is due to inherent limitations of this template or a consequence of the short length of numbers - because that's clearly what keeps the proof's arbitrary-length algorithms from being used on these.

The digits of the 5-digit palindromes are calculated in pairs, each from a sum they need to add up to, and for the pair made up by the outermost digits of both palindromes, an additional requirement that neither may be \(0\) (for obvious reasons). Other than this, the proof gives freedom of choice to the implementation - one of only two places it does so. (The other one is located in pickType.) I previously suggested splitting the sum by integer division by 2 (with the remainder getting added onto one of the halves), but in the end I use that choice only for the third pair (representing the central digit of each 5-digit palindrome). The other pairs have in common (across all cases) that their sums consist of a digit from the input number and a constant, and it turned out to be slightly better to use these components more or less directly as members of the pair. The constants are generally \(0\) or \(g-1\), even - that makes it easy to keep the pairs (and by extension, the palindromes) sorted by magnitude, as they are the extreme values a digit can take. The constants can be simply allocated to one of the slots in the pair with no danger of getting the order wrong.
There are three situations where this procedure leads to problems the integer division procedure would dodge without any effort:
- In case iv.a the constant for the second pair is \(-1\), i.e. not a valid digit. The way to avoid it is decrementing the variable digit and setting the constant one to \(0\).
- In cases i, ii, and iii (other iii.d and the special case in iii.c, which both abandon the template) the first pair is not allowed to contain a \(0\), so if the variable digit is \(0\), it is incremented to \(1\) and the constant decremented to \(g-2\) instead.
- In case iv.a, the constant for the first pair is \(g\), i.e. not a valid digit (again). You can likely guess that can be resolved: increment the digit, and decrement the constant to \(g-1\). This would also avoid the previous problem with a \(0\) in the first pair. However, my implementation happens to have access to the values chosen for i, ii, and iii, so it recycles them by incrementing the variable digit and retaining the constant. If the input digit was \(0\), it is therefore double-incremented to \(2\) and the constant will be \(g-2\).
Why are these corrections worth passing up the easy path of integer division by 2? For one, cases iii.a, iii.b, and iii.c reduce the freedom of choice and explicitly set the first pair to values that match this procedure's result but don't match the integer division's. (Mathematically, either of them works out but I want to match the proof's results as closely as I can.) The other incentive is that in cases iv and v the input digit used for this pair (\(\delta_4\)) is needed for extra checks - with the integer division, it would need to be reconstructed (by adding the pair back together and subtracting the constant away again) or retained (leading to a superfluous value for other cases that needs to be disposed of). With the procedure I'm using, the decision between iv and v can reuse the pair member already on the stack, because even though the correction obscures the difference between \(\delta_4=0\) and \(\delta_4=1\), it's irrelevant there. Inside v, I bite the bullet and reconstruct \(\delta_4\) from the sum.

Regarding the case hierarchy, i is the default case, with a 3-digit palindrome as the third one. Cases ii and iii take care of some minor concerns that come up when the sum of the first pair forces the third palindrome's last digit (and by extension, first digit) to be \(g-1\). (Actually, ii with its extra condition \(\delta_2\ne 0\) distinguishing it from iii essentially says: problems avoided, fall back to i. It's just iii that is deviating, with different sums for
the second and third pairs and a 4-digit third palindrome.) iv and v are the cases where the third palindrome's last digit comes out as \(0\) (which obviously isn't allowed there, just like in the first pair). iv.a saves what it can and still manages to apply the same template with some minor adjustments to the pair sums.
With all this, it makes sense to share as much as possible of the code constructing this template across all branches. Additionally, since the outermost digit of the third palindrome is so central for the decision which branch to take, I'm optimistically calculating it and the first pair (which it more or less depends on), and only then I use copies of it to find out which case applies.
After the dispatching is done via an assortment of nested conditions, the construction of palindromes according to i resumes. ii and even iv.a use the same code, but iii.a, iii.b, and iii.c need to skip part of it (quite literally, with the SKIP command), so that part is wrapped in a secondary. (The comment after that is slightly wrong, it should mention iv.a instead of all of iv. Why do I keep discovering mistakes like that after I'm officially done?)

That's most of the code dealing with a pair of 5-digit palindromes explained. The only piece left is inside iii, but there's not much to it. In iii.a, iii.b, and iii.c, the formulae for the palindromes' digits change due to switching to a 4-digit third palindrome, but the procedure is equivalent to what the corresponding piece of cases i, ii, and iv.a does (which is the part that gets SKIPped): it calculates the second pair for the 5-digit palindromes, from there the third palindrome is completed and bundled up into a list, and the sum of the last pair of digits is prepared. The special case \(c_2=2\) in iii.c appears as if it is present for iii.a and iii.b as well, because these three cases share all their code, but with the input digits that lead to those cases, \(c_2\) can't ever be \(2\). That's why I can place that special case there without specifically checking for iii.c.
As should be a familiar ritual by now, that special case and iii.d both get rid of the shared code for the pair of 5-digit palindromes waiting in the return stack via RDROP. The special case is for a single digit combination, so it clears almost everything off the stack and constructs the correct palindromes. (A \(g-1\) is recycled, that's all.)
iii.d has some superficial similarity to iv.b, v.d, and v.e with palindromes of length 6, 5, and 3, but due to all the differences and iii.d being out of place (on the wrong branch of the overall dispatcher), I found that it's not worth trying to share code. Instead I just roughly followed the proof's order of which values to calculate: \(c_4\) comes first, then \(c_2\) and \(z\) together. In the process, the value \(1-c_4\) is more useful to hold onto than \(c_4\) by itself (and once it's calculated, the whole first palindrome is done and dusted). The \(c_2\) and \(z\) happen to line up with what computeCarryAndRem can calculate. The rest of the case's code is simple enough to not need extra explanation, I think.

Now, onward to iv and v. As one of the branches in the top-level ITE, this branch sits in its own secondary, and its two cases are distinguished by another ITE. There's some shared code responsible for a good chunk of the headache-inducing template after it - iv.a avoids it with an RDROP in its branch so it falls back to the pair-of-5-digit-palindromes code again. When that shared code does get run, however, it returns the favor with an RDROP of its own, which tosses the pair-of-5-digit-palindromes code off the return stack.
Let's leave that headache for last, though, because in v.a, v.b, and v.c the old template based on subAsZINTthenAlgS reappears. Much like the first attempt in AlgS5 it splits off a palindrome starting with the same two digits as the input number. However, due to the conditions for reaching this template, the "failure" cases get handled (mostly) individually, instead of a second attempt. This leads to a very differently shaped code, consisting of my take on a C-style switch() statement, as seen before in fromZINT: a list of secondaries, accessed by NTHELCOMP. The "default" case is v.d and v.e together, representing the not-yet-discussed template.
The secondaries in the list correspond to one each of the cases v.a, v.b, and v.c. As members of one template, these cases share much of their code (most importantly, the subAsZINTThenAlgS call), which I lumped with the EVAL running the secondaries. There's also 2RDROP in there, tossing the return stack entries for shared code of both competing templates in one stroke.
These secondaries themselves return the last digit of the input number, as required by the conditions needed to reach this template. This is needed because the digit is used up while checking the conditions, and I can't do much about that without disrupting the code for the pair-of-5-digit-palindromes template. The secondaries also handle all the nameless special cases, with two exceptions: those which are just redundant examples for the shared subAsZINTThenAlgS code (recognizable by their first palindrome; the proof merely lists them explicitly because its authors apparently want to keep the conditions for recursing into lesser AlgS subroutines simple), and the single special case where subAsZINTthenAlgS returns a 3-palindrome result (\((g-1)(g-2)=(g-2)(g-2)+(g-1)+1\)). The latter is from v.a but since it's easier to catch after the subAsZINTthenAlgS call, it's handled in the shared code.
The special cases make for bulky but not interesting code. Note that there are some among them which could mathematically be allowed into subAsZINTthenAlgS (namely \(110100\) from v.b as well as \(120101\), \(120111\), and \(120021\) from v.c), but the proof expands them into different palindromes. As usual, I replicated the proof's results exactly instead of going for shorter code - otherwise I could have reduced the special cases in v.b and v.c to two each: \(1100010\) and \(110000\) as well as \(120011\) and \(120001\).
Note that even though the number subAsZINTthenAlgS subtracts from the shortened input number consists of two digits again, I can get away with BINTs instead of the ZINT trickery used in AlgS5Callback. This is because this time the higher digit is at most \(2\) (in v.c, to be precise), and \(21=2\cdot g+1\) still fits into a BINT even when \(g\) is 9999.

For the headache-inducing template (which, by the way, doesn't even exist in version 1 of the proof; that's one of the big gaps fixed by version 2), I was practically forced to trim down the formulae for the digits and/or replace them entirely with alternatives which yield the same results. The first target is obviously the intermediate value \(y\), which is entirely dependent on \(\delta_1\). The next one is \(c_2\), which is also entirely dependent on \(delta_1\), but it's harder to spot that because part of this dependence is indirect via \(y\).
There's also a cluster of typos in this area. You'll see them if you recall that \(c_1\) is an ordinary carry. iv.b gets away without typos, but in v.d the definition of \(c_1\) given below the figure doesn't match the digits in the corresponding column in the figure, which are: \[(1\text{ (incoming carry)}\\+0\text{ (first palindrome's digit)}\\+(y-c_2+1+c_1)\text{ (second palindrome's digit)}\\+(D(\delta_1-1-y)+(c_2-1)-c_1)\text{ (third palindrome's digit)}\\-\delta_1\text{ (remainder of division = input digit)}\\)/g\]That simplifies to \(c_1=(1+y+D(d_1-1-y)-\delta_1)/g\); note the discrepancy in the constant, where I get a \(1\) and the proof's definition has a \(2\). This is fortunately inconsequential if we do the sensible thing and treat the division symbol as integer division, because the difference will just result in a discarded remainder of \(1\) instead of the \(0\) which the proof usually aims for by subtracting the input digit \(\delta_1\) (the expected and desired remainder) beforehand. You'll only run into trouble if you skip this normally superfluous subtraction and receive \(g-1\) as input digit - that would combine with this typo's offset into \(g\), which is just enough to survive the implicit floor() part of integer division.
In case v.e the situation is even more serious. This time there is a typo in the equivalent spot but the other way around, another inside the \(D(\text{...})\) part of the definition, and a third in the definition of \(y\). In all three places there is a \(1\) where a \(2\) should be. Since this is the other way around from v.d, the first of these three by itself would be catastrophic (wrapping the remainder around to \(g-1\) and decrementing the quotient), but the second one mostly compensates for that. There's a slight concern left about the values of \(\delta_1\) and \(y\) that cause \(D(\text{...})\) to wrap around to \(g-1\), but that's where the third typo in the definition of \(y\) makes that combination of values trivially impossible. That leaves up for debate what consequences this wrong value for \(y\) has elsewhere, and the answer is: not much. It alters all non-constant digits in the palindromes, but it doesn't break anything: the definition of \(y\) is obviously intended to ensure that \(D(\delta_1-2-y)\) isn't at the lower or upper end of the valid digits range (so that the difference between \(c_2\) and \(c_1\) doesn't cause the third palindrome's middle digit to go out of range), and now it can be \(0\) if \(\delta_1=3\). That input value causes \(y=1\) due to the third typo, but luckily these values cause \(c_1\) to be \(0\) as well, avoiding the possibility of an invalid digit. Crisis averted, but the proof by itself has a gaping hole in its logic.

These typos were supposed to be circumvented in my implementation, but I didn't manage that completely - the wrong \(y\) in v.e remains. Between the typos, this implementation quirk, and the general complexity of this template's formulae, the resulting optimized code is pretty much unreadable to anyone except myself as its author, and even I am having trouble understanding it as I'm revisiting it while writing this. Therefore I'll give a stack diagram at a pivotal point in the code, as well as a bunch of alternative formulae for important values - I cooked those up while coding, and they are supposed to line up better for optimized implementation (though they are not suitable for use in the proof's conclusions, so I get why it uses the formulae it does).

Both branches of this template - v.d and v.e are handled by mostly the same code in one branch, iv.b is the other branch - start from a clean slate with only the still-relevant input digits on the stack. That is, \(\delta_1\) is in level 3, \(\delta_2\) in level 2, and either \(\delta_3\) (for iv.b) or \(\delta_4\) (for v.d and v.e) in level 1. The iv.b branch needs a 4DROP to make it so, for the other branch this is already the state of the stack on entry. Both branches then proceed to compute the two values dependent on \(\delta_1\) only - that's the first ITE in both. For v.d and v.e it actually computes \(y+c_1\) and \(c_1\), because in the palindrome digit I derive from \(y\), that branch needs \(c_1\) added on top of it. iv.b uses \(y\) by itself in that digit's formula, but it generates \(1+c_1\) in place of just \(c_1\). (iv.b additionally defines a value nearby called \(x\), but my code postpones computing that until it's needed.)
The formulae used are:
- for iv.b, \(y=\begin{cases}1&\text{if }\delta_1=0\\\delta_1&\text{if }1<\delta_1<4\\1&\text{otherwise}\end{cases}\)
and \(c_1=\begin{cases}1&\text{if }\delta_1<4\\0&\text{otherwise}\end{cases}\)
- for v.d and v.e, \(y=\begin{cases}\delta_1+1&\text{if }\delta_1<3\\0&\text{otherwise}\end{cases}\)
and \(c_1=\begin{cases}1&\text{if }\delta_1<3\\0&\text{otherwise}\end{cases}\)
You can see how nicely the conditions for both values computed in one branch line up with each other, making it possible to generate both values in the same ITE. That's not surprising if you know that an intermediate step in the transformation process looks like \(c_1=\begin{cases}1&\text{if }\delta_1<k+y\\0&\text{otherwise}\end{cases}\) (with \(k=3\) for iv.b, \(k=1\) for v.d and v.e), and from there you can try out all the branches of my \(y\) definition and plug input / output pairs into this one to find out that the "otherwise" branches line up.
If I had paid enough attention to typo correction, the formulae for v.e would have lined up more with iv.b from the other branch than its partner v.d. (For \(y\) that's obvious from the proof's definitions, for \(c_1\) you'd have to set \(k=2\) in the intermediate equation and conclude that it's indistinguishable from \(k=3\).) That would have resulted in a very different implementation, due to different optimization opportunities. I may fix it in the future, but as the analysis above shows, the resulting palindromes are still valid. I can also claim that it matches the proof's results, as "wrong" as they may be.

The next values calculated are a trio of easily calculated final palindrome digits: the one between the first palindrome's outermost and central digits (\(x_2\) in the terminology seen in the proof's description of the pair-of-5-digit-palindromes template and in the long-number algorithms), and the outermost digit of the other palindromes (i.e. \(y_1\) and \(z_1\)). In iv.b the latter two are constants, and the former has one of two values (\(3\) or \(2\)) depending on \(c_3\). That in turn boils down to a simple ITE as it is: \(c_3=\begin{cases}1&\text{if }\delta_3<y\\0&\text{otherwise}\end{cases}\)
In the other branch the paths for cases v.d and v.e finally diverge. v.d is essentially implemented as an "error-handler" for when v.e finds that its value for the second palindrome's outermost digit \(\delta_4-3\) is \(0\) (i.e. invalid in that position). In v.e the other two values are constants - and in v.d all of them are constants too, but with different values (I manage to recycle two values, a constant \(2\) and \(\delta_4-3=0\), by shuffling them to other constant slots).
I also set a value \(x_2-c_0\) which is in most cases a duplicate of \(x_2\), except in v.d which has constant values that result in a carry \(c_0=1\) from the last column of palindrome digits. v.e's value for that is the \(2\) that gets recycled in v.d.

In addition to the constant shuffling and the carry from the palindromes' outermost digits, v.d alters \(y+c_1\) to \(y+c_1+1\). In the digit formulae this \(+1\) is meant to counter the shifted value range of \(c_2\) compared to what that intermediate variable evaluates to in v.e, so I ought to adjust that. However, with the selection of numbers I compute via the proof's formulae or my alternative versions (the rest I hand off to BINTmod or computeCarryAndRem instead), it turned out to be easier to attach the \(+1\) to this intermediate value.

As the final steps before dropping into the code shared among all three cases, \(z_1+c_1\) is prepared (iv.b has no need for this as in that case \(z_1\) is always \(1\) and it computed \(1+c_1\) instead of \(c_1\) ahead of time), and \(\delta_3\) is brought to stack level 1. In the branch covering v.d and v.e that's a constant \(0\).

On entering the shared code the stack therefore contains:
9: \(\delta_1\)
8: \(\delta_2\)
7: \(z_1+c_1\)
6: \(y\) for iv.b, \(y+c_1+1\) for v.d, or \(y+c_1\) for v.e; in other words, \(y_2+c_2\)
5: \(z_1\)
4: \(x_2+c_0\)
3: \(y_1\)
2: \(x_2\)
1: \(\delta_3\)
The order looks jumbled, and it kind of is, but that's the most efficient one I could find.

The shared code uses these up to compute the remaining values and eventually bundle up the finished palindromes. It gets started with \(x_3\), and if you account for the \(x\) definition in iv.b and the proof's simplification of \(D(0-\text{...})\) to \(g-\text{...}\) in the \(x_3\) formula, that's straightforward with BINTmod. A copy of \(x_3\) then goes into computing \(y_3\) with computeCarryAndRem. That also drops \(c_2\) into my lap without having to bother with any additional formula.
Now is the moment to start caring about the \(\mu\) adjustment for iv.b, which I completely ignored up to this point because it reacts to \(c_2=2\). Since we're in the shared code now, that needs to be sorted out cleanly; \(c_2=2\) can also occur in v.d, so I augment the condition for the IT with a check that is true for iv.b but false for v.d. I settled on \(z_1=1\) which is also true for v.e, but that's irrelevant because that case cannot generate \(c_2=2\). The correction itself decrements \(x_3\) and sets both \(c_2\) (or more correctly, \(c_2-\mu\)) and \(y_3\) to \(0\).
From here, everything else goes downhill: \(y_2\) comes out of subtracting now-known \(c_2\) from the \(y\)-based value calculated long in advance, then \(z_2\) combines several numbers with BINTmod. Finally, a bit of stackrobatics mixed with {}N variants package the digits into palindromes. Note that I didn't mention \(x_1\) at all; it's always \(1\) across all three cases, so it can be set while bundling up the first palindrome.

If I release an updated version, I'll make sure to include an additional comment or two to note the variables on the stack and their order, because this template is the one section where even I, the author of the whole shebang, couldn't fully reconstruct the data flow from memory and pure brain power. I had to write stuff down, and I hate doing that.

Code:

LABEL AlgS6.s

::

  OVER #1= NOTcase pickType

  2DROP

  1GETLAM #1-SWAP

  DUP#0=IT :: DROP#1- BINT1 ;

  2DUP

  computeCarryAndRem 8ROLL

  DUP#0=ITE

  ::

(iv and v)

    DROP 4PICK #0=ITE

    ::

(v)

      DROP ROTDROP

      #+ 1GETLAM #1--

      {

        ::

(v.a)

          1GETLAM #1-

        ;

        ::

(v.b)

          2DUP#+ #>1 case BINT0

          RDROP 2SWAP 2DROP

          #1=case

          ::

            ONESWAP 1GETLAM

            #1-DUP #ZERO#ONE

            BINT6 P{}N

            ONEONE 1GETLAM

            #1-SWAP THREE{}N

            1GETLAM #2- ONE{}N

          ;

          BINT1 ZEROZEROZERO

          #ZERO#ONE BINT6 P{}N

          SWAP DUP#0=csedrp

          ::

            1GETLAM #1-DUP

            2DUP BINT4 P{}N

            BINT0 ONE{}N

          ;

          ZEROZERO #ZERO#ONE

          BINT5 P{}N

          1GETLAM #2- ONE{}N

        ;

        ::

(v.c)

          DUP#1+ #>2case BINT1

          2DUP#+ #>2case BINT1

          RDROP 2SWAP 2DROP

          DUP#0= 3PICK #1=

          ANDcase

          ::

            ZEROZERO #ZERO#ONE

            BINT6 P{}N

            BINT2 ZEROZEROZERO

            BINT2 BINT5 P{}N

            1GETLAM #2- ONE{}N

          ;

          ONEONE 1GETLAM #1-DUP

          ONEONE BINT6 P{}N UNROT

          OVER#+ 1GETLAM #1-SWAP

          #0=case

          ::

            SWAPDROP #1-DUP

            TWO{}N BINT2 ONE{}N

          ;

          BINT1 OVER#1-_

          4ROLL #+ BINTmod

          BINT1 THREE{}N

          SWAP ONE{}N

        ;

      }

      OVER #1+ NTHELCOMP case

      ::

(v.a,v.b,and v.c)

        2RDROP SWAPONE OVER

        ZEROZERO 3PICK BINT1

        BINT6 P{}N 5UNROLL

        1GETLAM #* #1+ 4UNROLL

        EVAL UNROT BINT3

        subAsZINTthenAlgS ROT

        CARCOMP #0=?SEMI

        3DROP 1GETLAM #1-DUP

        #ZERO#ONE BINT0 4PICK

        BINT5 P{}N SWAP

        DUP#1- 2DUPSWAP

        BINT4 P{}N

        BINT1 #ZERO#ONE THREE{}N

      ;

(v.d and v.e)

      3PICK #<3 ITE

      :: BINT1 4PICK #2+ ;

      #ZERO#ONE

      BINT2 4ROLL #3-

      DUP#0=ITE

      ::

(v.d)

        ROT#1+UNROT

        BINT1 UNROT

        1GETLAM #1-

      ;

(v.e)

      #TWO#ONE

      6ROLL OVER#+ 6UNROLL

      4UNROLL BINT0

    ;

    ::

(iv)

      1GETLAM #1- 3PICKOVER

       #<>case

      ::

(iv.a)

        RDROP ROT#1+UNROT

        5ROLL ROT#1+UNROT

        #1- BINT0

      ;

(iv.b)

      4DROP

      3PICK BINT4 #<ITE

      ::

        BINT2 4PICK

        DUP#0=IT DROPONE

      ;

      ONEONE

      3PICKOVER #<ITE BINT2 BINT3

      ONESWAP 1GETLAM #4-

      OVER 7ROLL

    ;

(iv.b,v.d,and v.e)

    6PICK #- BINTmod

    DUP 8ROLL

    computeCarryAndRem 9ROLL_

    4UNROLLDUP #2=

    8PICK #1= AND IT

    ::

(iv.b.mu)

      DROP#1- ROTDROP

      ZEROZERO 4UNROLL

    ;

    8ROLL SWAP#-

    8ROLL 7ROLL 3PICK#+

    #- BINTmod 6UNROLL

    BINT1 2SWAP 2DUPSWAP BINT1

    BINT6 P{}N 6UNROLL

    SWAPOVER 4PICK

    BINT5 P{}N UNROT

    OVER THREE{}N RDROP

  ;

  ::

(i)

    1GETLAM #1- 2DUP#=

(ii)

    8PICK #0= ANDNOTcase 6ROLL

(iii)

    DROP3PICK #>2case

    ::

(iii.d)

      DROPRDROP DROPSWAPDROP

      #ZERO#ONE SWAP4PICK

      #0=?SKIP DROPONE

      DUP#1+ BINT5 UNPICK_

      ZEROZERO 3PICK 5PICK

      BINT6 P{}N 5UNROLL

      #1-UNROT

      #1- BINTmod DUPUNROT

      computeCarryAndRem 5ROLL

      4UNROLL BINT2 SWAP#-

      OVER 4PICK BINT5 P{}N

      1GETLAM #2-

      ROTOVER THREE{}N

    ;

(iii.a,iii.b,and iii.c)

    SWAP 5ROLL DUP 5UNROLL

    computeCarryAndRem 8ROLL

    ROTOVER 2DUPSWAP BINT4 P{}N

    7UNROLL #+

    5ROLL SWAP 1GETLAM

    2DUP#> NOTcase SKIP

    RDROP 5DROP SWAPDROP#1-_

    #TWO#ONE SWAP3PICK DUP

    #TWO#ONE BINT6 P{}N

    BINT1 3PICK #1- BINT1

    THREE{}N ROT ONE{}N

  ;

  ::

(i,ii,and iv.a)

    2DUP#+ 5ROLL

    computeCarryAndRem 9ROLL_

    5ROLL DUP ROTOVER THREE{}N

    8UNROLL #+

    4ROLL UNROT

    6ROLL 1GETLAM #+

  ;

(i,ii,iii.a,iii.b,iii.c, and iv)

  SWAP#- BINT2 #/

  SWAPOVER #+ 4UNROLL

  OVER 4PICK BINT5 P{}N 5UNROLL

  OVER 4PICK BINT5 P{}N UNROT

;

@

3298 · 05-25-2023, 11:21 PM

After the complexity of AlgS6, following the path of longer numbers is almost refreshing. Almost...
pickType cleans the stack, then it's supposed to dispatch the various types (A1 to A6, B1 to B7) and call AlgMain with a fixed small number of parameters.
A bit of complexity creeps back in with the way I chose to implement the part of dispatching which depends on \(\delta_0\), because I do that with a callback passed via runstream to AlgMain. The reason is that AlgMain needs the carry \(c_1\) anyway, which it can get from the helper function computeCarryAndRem I already explained and used above. That incidentally yields \(z_1\) as well, and if you look close enough, this part of the dispatching work effectively operates on \(z_1\), as a kind of "error handling": when one type's choice of digits would result in invalid \(z_1=0\), it gets fixed by the other type(s) in a group jumping in.
The groups are mostly pairs: A1 with A2, A3 with A4, A5 with A6, B1 with B2, and B6 with B7. The remaining trio form the only larger group: B5 with B3 and B4.
In each of these groups you'll notice that the conditions for every type in it are similar and generally combine nicely (with the notable exceptions of A5+A6 and B1+B2), and the initial palindrome digits are also very similar, all of which makes my job easier.

The group with the simplest condition is A1+A2 with \(\delta_{l-2}>2\). You'll notice two things: a BINT0 pushed before the comparison (the values pushed in this group start with a \(0\), and when the condition doesn't fit, the next comparison uses a BINT0 as parameter for its first command as well), and 3GETLAM used as index into the virtual stack. Its value is initially \(1\), corresponding to the proof's \(l-2\) index. This is necessary here because AlgV can bring us back here with a number shortened by its subtraction (rolling over from types A5+A6 to A1+A2); in that case, 3GETLAM will recall a stored \(2\) instead to skip past a leading \(0\) digit lingering in the virtual stack. When the proof directs you to use Algorithm V and the subtraction found in there shortens the input number, you'd have to redo your indices. During implementation of AlgV however, I found that I could just leave the leading \(0\) digit in the virtual stack and advance this index instead.

The parameters this group hands to AlgMain are: a \(0\) identifying this as type A (type B is represented by a \(1\)), \(x_1\), \(y_1\), and as callback BINT1. This callback is called when AlgMain computes \(z_1\) as \(0\), and its return value is both used as replacement \(z_1\) and subtracted (modulo \(g\)) from \(y_1\). BINT1 is therefore the simplest possibility for this callback, and the fitting one for type A2.
The next group to be split off is A3+A4, with the condition \(\delta_{l-1}\ne 1\). There's no need to handle AlgV weirdness anymore (its subtraction can only result in a single leading \(0\); the following digits will be \(g-1\), which with \(g\ge 5\) will always be type A1+A2), and its values are also straightforward with the explanations above. The only funny business is how the code is ordered with A3+A4 at the end instead of an embedded secondary right after the comparison, owed to how I can't invert #1=case without splitting this single command into two commands: either #1<> case or BINT1 #<>case. Optimization demands I use the single command.

That leaves the three type B groups (B1+B2, B6+B7, B5+B3+B4) as well as A5+A6. The conditions for these don't line up as nicely anymore; some of them even overlap. I even put a comment into the code to point it out (albeit with a typo, B4 overlaps with B6 instead of with B7).
This is the other place after the pair-of-5-digit-palindromes template in AlgS6 where the proof's specifications grant a choice to a strictly conforming implementation, and I put it to good use by shrinking B4 (discarding the half with \(\delta_{l-3}=3\), which makes the conditions for the entire B5+B3+B4 group simpler). Additionally, with B7 being a special case of B2, and their respective partners B6 and B1 being able to be combined in a similar way, I have the opportunity to nest conditions in an advantageous way: if \(\delta_{l-2}=0\), I still have to tell apart A5+A6 from B1+B2 in intricate ways (because the cutoff is \(\delta_{l-3}>2\) or \(\delta_{l-3}>3\) depending on whether \(\delta_{l-3}=\delta_0\); yes, this sort-of duplicates the later calculation of \(z_0\) for A5 or B1, but at least I simplified the formula into a basic equality check), though if not (i.e. \(1\le\delta_{l-2}\le 2\)), the line between B5+B3+B4 and B1+B6+B2+B7 is a smooth \(\delta_{l-3}>2\).
So that's the rest of the branching structure set in stone ... well, not quite, because some runstream / return stack magic lets me share the code for both B1+B2 instances without resorting to a boring subroutine called from two places (or, with the Nosy-like library construction, embedded in one place and called from the other). Look at the comparison which distinguishes B5+B3+B4 from B1+B6+B2+B7: it's a single command with "case" semantics (#>2case, to be precise), followed by the secondary handling B1+B6+B2+B7, and then the B5+B3+B4 code after that. That means not only does the other branch (distinguishing A5+A6 from B1+B2) need this branch to linger on the return stack, it also must pluck out the B1+B6+B2+B7 secondary from that return stack level's runstream and discard the rest.
Keeping the runstream on the return stack is easy, just use something based on IT instead of case. With ITE I also get to advance that runstream past the #>2case if the branch leading to A5+A6 and B1+B2 is taken; if the #>2case consisted of more than a single command, I could have used RSKIP from inside the A5+A6+B1+B2 branch to skip the rest, or turn them into a single object by wrapping them into a secondary (though the latter option would complicate runstream manipulation when that hypothetical secondary gets executed; I'd have to resort to COLAcase inside). When it turns out A5+A6 is the correct path to take, I can just as easily RDROP the return stack level.
The real mind-twister is how to limit that return stack level's runstream to just the B1+B6+B2+B7 secondary when A5+A6 is not taken. The A5+A6 code is discarded by ORcase, that's not exactly black magic, but how does one chop down a return stack level's runstream from somewhere else?

A simple solution would be to use a combination of 'R (which steals the first element out of the first return stack level's runstream and puts it onto the data stack) and RDROP (which exterminates the rest), then do whatever you want with the item on the data stack (e.g. push back onto the return stack with >R, or call it directly with EVAL). Flexible, but not ideal.
An improved variant offers itself if I dig deeper into the effects of runstream- and return-stack-manipulating commands. Any command with "case" semantics (like the ORcase we're working with) acts as COLA or SKIP, depending on whether the condition fits or not. A COLA in turn can be described as executing something as if it was in the parent runstream (i.e. the runstream on the return stack). That means we're looking for something that stands in for the #>2case that got skipped, but without the condition ... okay, it's obvious now, a COLA. That's how the ORcase COLA line came to be.
The take-away here is: analyze what effect a runstream- or return-stack-altering command has on others EVALed by it. You may find an opportunity to engage in further manipulation inside what it EVALs. Describing it as "it's EVALed as if ..." can help understand the state of runstreams.

In A5+A6 and B5+B3+B4 the callback is more complex than a self-pushing constant. When AlgMain calls it, the data stack contains the three values pickType passes to it in levels 4 to 2, and the carry \(c_1\) in level 1. The corresponding \(z_1\) has been determined to be \(0\) at that point and consequently DROPped.
For A6 the callback has to produce \(g-1\) (which happens to be present on level 3 already, in the form of \(x_1\)), and it also needs to adjust \(y_1\) such that subtracting the returned \(g-1\) from it ends up with a \(+1\) over its original value. Phrased like that it's obvious that the callback's compensation has to be adding \(g\) onto it - and this compensation throws off the sum of \(x_1\), \(y_1\), and \(z_1\) by \(g\) as well, so the carry requires adjustment too.
For B3+B4 the callback is responsible for not one but two types. B3 by itself would be another simple BINT1 callback, and B4 has to decrement \(x_1\) by \(1\) and return \(g-2\) (AlgMain will then use that as \(z_1\) and also subtract it from \(y_1=g-1\), resulting in \(1\) as demanded by the proof). The complete callback packages both of these cases with the appropriate \(\delta_{l-3}>1\) condition.

In the condition to choose between A5+A6 and B1+B2, you see a check operating on LAM5. This is an artifact of Algorithm V: when the result of AlgV subtracting from an even-length B1+B2 number looks like an odd-length A5+A6 number, classify it as even-length B1+B2 regardless. LAM5 achieves this by defaulting to \(0\) (i.e. type A), and right at the start of AlgMain it gets set to \(0\) or \(1\) depending on type A or B. AlgV gets asked about its opinion after that, so LAM5 will be \(1\) when AlgV processes a type B number and orders a retry.

One last line to point out: after A1+A2 and A3+A4 are split off, the line BINT2 3PUTLAM ensures that when AlgMain starts, the index stored in that local variable points to the input digit which will line up with \(y_1\). That would be \(\delta_{l-2}\) in A1+A2 and A3+A4 (with the already-mentioned quirk for AlgV), but for the rest it's \(\delta_{l-3}\).

Apart from the goofy return stack manipulation in place of calls to a helper function, this general structure of this type dispatcher ought to translate reasonably well to other languages. If you have an optimizing compiler (a common occurrence in C, C++, and their derivatives, among others), the callback part may be a hindrance as the compiler will have a hard time arguing about what part of the overall program state the callbacks will affect. In that case it might be better to just pull down the computation of \(z_1\) into the various branches of the dispatcher, handle a \(0\) result with non-callback code, and let the compiler's optimization figure out the rest. On the other hand, some languages have features to help the compiler figure out what the callback is allowed to do: Haskell as a pure functional language definitely does, Rust might do too, ...
Even if you don't go that far in adapting my solution, the conditions to tell apart types may be useful without all the other intricacies.

Code:

LABEL pickType.s

::

  NDROP BINT0

  3GETLAM GetElemBotVStack

  #>2case

  ::

(A1 and A2)

    3GETLAM #1- GetElemBotVStack

    3GETLAM GetElemBotVStack #1-

    INCLUDE AlgMain BINT1

  ;

  GetElemBotVStack #1=case

  ::

    BINT2 3PUTLAM

    BINT2 GetElemBotVStack

    DUPONE GetElemBotVStack

    #0=ITE

    ::

(A5,A6,part of B1,and part of B2)

      BINT0 GetElemTopVStack

      OVER#= ITE BINT2 BINT3

      #> 5GETLAM #1=

      ORcase COLA

(A5 and A6)

      RDROP

      BINT0 1GETLAM #1-

      ROT#1+

      AlgMain

      ::

        #1+SWAP 1GETLAM #+SWAP

        3PICK

      ;

    ;

(part of B1,part of B2,B6,and B7)

    #>2case

    ::

(B1,B2,B6,and B7)

(B6 is adjacent to B1)

(and has same initial values)

(B7 is contained in B2)

(and has same initial values)

(half of B4 overlaps with B7)

(as well,but has different)

(initial values)

      ONEONE GetElemBotVStack

      ROT #1-

      AlgMain BINT1

    ;

(B3,B4,and B5)

    DROP ONEONE

    GetElemBotVStack #1-

    1GETLAM #1-

    AlgMain

    ::

      BINT2 GetElemBotVStack #>1

(B3)

      NOTcase BINT1

(B4)

      ROT#1+UNROT OVER#1-_

    ;

  ;

(A3 and A4)

  ZEROZERO GetElemBotVStack #1-

  1GETLAM #1-

  AlgMain BINT1

;

@

You'll likely need some time to understand AlgMain, because not only does it interface with AlgV and the callbacks from pickType, it also implements the meat of all four main algorithms (Algorithm I, Algorithm II, Algorithm III, and Algorithm IV), most importantly the looped step, in a unified manner.

First it stores the \(0\) or \(1\) which is the marker to tell apart types A and B into a local variable, which I named the "index adjustment variable" in fromZINT because this \(0\) or \(1\) will be added to or subtracted from several indices throughout AlgMain. The value stays on the stack though, because for type B it doubles as the extra \(x_0\) present in the first palindrome of that type. For type A the \(0\) is just a harmless dummy value that'll stay there until AlgIfinish or AlgIIfinish reach into these depths again.
The two lines after that calculate the number of loop iterations. The formula for that involved a lot of experimentation and reasoning about what variables my code has access to at this point. Division by \(2\) is a given, same for using the length of the input number as a basis. Using the index to \(x_2\) from \(\delta_{l-1}\) is very much non-trivial, but it ticks all the boxes I need: types A5+A6 are handled as if they were A1+A2 or A3+A4 with an input number shorter by one digit; an AlgV-shortened A1+A2 variant is handled correctly because the pickType quirk of abusing the index to skip past the leading \(0\) is compensated by simply not updating the original input number length in AlgV; and the type B odd-length numbers (Algorithm III) have a smaller-by-1 iteration count compared to their type A peers (Algorithm I). For even-length type B numbers (Algorithm IV) and their type A peers (Algorithm II) this isn't true due to how this lines up with the floor() behavior of integer division, and the proof's phrasing suggests this is wrong, but it's actually perfect: in Algorithm IV, step \(m-1\) is equivalent to an additional iteration of the preceding looped step.

As a byproduct of the division by \(2\), I receive the evenness almost for free in the form of the remainder. The type B quirk means I can't compare it directly against a constant \(1\) or \(0\), but I can compare it against the "index adjustment variable". AlgV is conditionally run based on this, and it will either: return without an effect (if its own supplementary checks say Algorithm V is not applicable); or it will clear the stack, discard the return stack levels pointing to here in AlgMain and to the callback in pickType, perform its adjustments, and retry pickType.
The remainder and type A / type B distinguishing marker are also used to determine which of the algorithms AlgMain shall act as. This isn't important yet, but it's going to be important after the loop, and I don't want to redo the calculation of the remainder, so I just use it here and store the appropriate subroutine in a variable. The order of AlgIVfinish and AlgIIIfinish isn't a mistake, it's another result of the remainder values being switched around for type B.
The iteration count is stored into a variable after the AlgV call because while it's beneficial to ultimately squirrel it away into a local variable for AlgIfinish, AlgIIfinish, AlgIIIfinish, and AlgIVfinish, there are several places inside AlgV which would like to see this still on the stack: it simplifies retrieving the middle digits from the virtual stack to check if they are \(0\), and when AlgV knows it won't return to AlgMain, it can use the value for its adjustments.

With this general preparation out of the way, it's time to explain the stack layout during the loop. More specifically, at the start of step \(i\) it looks like this:
- Buried the deepest are the already-generated digits of the first palindrome, i.e. \(x_0\) to \(x_{i-1}\).
- Next in order are the digits of the second palindrome, \(y_1\) to \(y_{i-1}\).
- The easiest-to-access partial palindrome is \(z_1\) to \(z_{i-1}\).
- Finally, a handful of "working variables". They obviously change over the course of the loop, but at the start of the step there are three of them: the carry \(c_{i-1}\) left over from the previous iteration, a non-buried copy of \(x_{i-1}\) for easy access, and a tentative value for \(x_i\).

The first carry is generated by computeCarryAndRem alongside \(z_1\). I've explained in pickType what happens in conjunction with that - a callback massages the values if \(z_1\) is a forbidden \(0\), effectively switching from e.g. type A1 to A2, or from type B6 to B7. This takes care of the last piece of step 1 in all four main algorithms.
The \(x_{i-1}\) copy is initialized from \(x_0\). Yes, in type A that's the dummy \(0\), but only type B will actually use this working variable. For type A the "index adjustment variable" redirects accesses elsewhere.
The "tentative \(x_i\)" represents what sets step 2 apart from the looped step: the only difference is in the definition of \(x_i\), where step 2 uses \(D(\delta_k-y_1)\) and \(D(\delta_k-y_1)-1\) (I define \(k\) here as \(l-2\) or \(l-3\), whichever lines up with \(y_1\); I have that index prepared in 3GETLAM for a reason), while the looped step uses \(1\) and \(0\). (Yes, I altered one of the formulae a little. It still yields the same result because no type sets \(y_1\) to exactly \(\delta_k\), so \(D(\delta_k-y_1)\) won't be \(0\), and the \(-1\) part is safe to pull out of \(D(\text{...})\).) Do you see how step 2's \(D(\delta_k-y1)\) is switched out for \(1\) in the looped step? That's the "tentative \(x_i\)", and its first value is therefore \(D(\delta_k-y_1)\).

The loop starts with refining the tentative \(x_i\) into the final one. The comparison the proof gives for its piecewise definition of \(x_i\) involves values that also go into the definition of \(y_i\), so the calculations for these are blended together. Caught up in that blend is also a line advancing the LAM3 index, because the digit recalled with that index is one of the values used for \(x_i\) and \(y_i\).
Both of these are subsequently buried in the depths of the stack with UNROLLs. I keep a copy of both around, though, because they are still needed...
The \(y_i\) copy is used up immediately along with the previous carry \(c_{i-1}\) by adding them together. \(x_i\) goes on a somewhat more convoluted journey: type A needs it to be used up immediately as well, but type B needs it to hang around for a full loop iteration, with \(x_{i-1}\) used instead. I achieve this by placing \(x_i\) and \(x_{i-1}\) on adjacent stack levels and grabbing one or the other depending on the "index adjustment variable" LAM5. The other one stays for the next iteration. With type A that means the \(x_{i-1}\) working variable won't actually contain \(x_{i-1}\) beyond the first iteration (it'll remain the dummy \(x_0=0\)), but if it doesn't get used, who cares?
The rest of the loop is simple again: use up the chosen one out of \(x_i\) and \(x_{i-1}\), use up \(c_{i-1}+y_i\). and receive \(z_i\) and \(c_i\) from computeCarryAndRem in return. \(z_i\) gets stashed, \(c_i\) and the leftover \(x_i\) are arranged as two of the next iteration's working variables, and a constant \(1\) joins them as the tentative \(x_{i+1}\).

At the end, one of AlgIfinish, AlgIIfinish, AlgIIIfinish, and AlgIVfinish is responsible for filling in any still-missing digits in the middle of palindrome, performing the correction step, and wrapping the palindrome digits into three lists. All of these are highly specific to one of the four main algorithms, so it's fine to let their paths diverge now.

Code:

LABEL AlgMain.s

::

  3PICK 5PUTLAM

  4GETLAM 3GETLAM #-

  BINT2 #/

  {

    INCLUDE AlgIfinish

    INCLUDE AlgIIfinish

    INCLUDE AlgIVfinish

    INCLUDE AlgIIIfinish

  }

  3PICK#1+_ 7PICK #2* #+

  NTHCOMPDROP 8PUTLAM

  SWAP 5PICK #=?SKIP INCLUDE AlgV

  7PUTLAM

(compute last initial value z_1)

(and first working var:prev)

(carry,then adjust initial)

(values if z_1=0)

  3PICK #2+PICK OVER

  computeCarryAndRem

  :: BINT0 GetElemTopVStack ;

  'R OVER#0= ITE

  ::

    SWAPDROP EVAL

    ROTOVER #- SWAPROT

  ;

  DROPSWAP

(second working var:x_i-1)

  4PICK

(third working var:tentative x_i)

  3GETLAM GetElemBotVStack

  5PICK #- BINTmod

(loop)

  7GETLAM ONE_DO

(calculate y_i and adjust x_i)

(if needed)

    3GETLAM #1+DUP 3PUTLAM

    GetElemBotVStack 5PICK

    2DUP #- #1-UNROT

    #>?SKIP SWAP#1-SWAP

    INCLUDE BINTmod

(save away copies of x_i and y_i)

    OVERINDEX@ #2* #5+ UNROLL

    DUPINDEX@ #5+ UNROLL

(calculate new carry and z_i)

    4ROLL #+

    5GETLAM #2+ROLL

    computeCarryAndRem

    :: INDEX@ GetElemTopVStack ;

(save away z_i and prepare)

(working vars for next iteration)

    SWAPROT BINT1

  LOOP

  8GETLAM COLA_EVAL

;

@

Before I descend into the "finishers" for the four main algorithms, let me sweep up another subroutine interacting with pickType and AlgMain: AlgV.
Algorithm V is meant to treat deficiencies of Algorithm II and Algorithm IV (the ones dealing with even-length numbers) that relate to occurrences of \(0\) in the middle pair of input digits. It does so by decrementing both of them (with normal carry into higher digits) and making up for the difference by adding it onto the (identical) middle digits of the first palindrome. As explained in pickType, there's a safety net which keeps the subtraction from making the input number drop into the range where the first palindrome no longer has a pair of central digits.
The proof gives special treatment to numbers where subtracting \(1\) from both central digits resolves the problem for one but merely shifts it over to the other. The special treatment amounts to applying AlgV twice, so I don't explicitly handle it, and just rely on a retry via pickType knocking at the AlgV door again. It's good to know that this recursion will fizzle out by itself after at most two retries (especially for the proof's correctness), but it doesn't lead to any visible consequences in an implementation.

What is visible, however, is that AlgV's checks aren't concluded after comparing the middle digits with \(0\). The other check compares the length of the input number with \(6\) - because while AlgS6 sends 6-digit numbers starting with anything other than \(1\) into the long-number code, one of its addenda to Algorithm II states that Algorithm V shall be suppressed.

After the checks go through, AlgV cleans up the stack. Only the index of the less significant one out of the middle digits is left standing, the other three items on the data stack have to go. The return stack is cleaned up at the end, right before the retry in the form of a pickType call: 2RDROP cancels the rest of AlgMain and the runstream pickType uses to pass a callback to AlgMain. The next runstream on the return stack points into fromZINT - or right at the end of AlgV, because a retry accumulates return stack levels. I could fix that via tail-recursion with COLA, but that costs an additional command, and the extra return stack levels don't hurt anybody since the recursion is tightly bounded.

Between these cleanup pieces the action of AlgV happens. The main objective is to decrement two adjacent digits on the virtual stack by \(1\) and propagate any resulting carry. I implemented it as "subtraction by hand", with the decrement of the more significant digit sneakily grafted onto the carry propagation (I increment the first carry). Another option would have been to dump the entire virtual stack onto the data stack, split it below the middle digits, use something like subAsZINTthenAlgS (without the AlgS part, somehow...) to subtract \(g+1\) from the upper part, then glue the parts back together and make a new virtual stack from them. I found that not worthwhile when I attempted it, but maybe I just need to give it another go. It would have different consequences on a leading \(0\) for sure, thanks to the round-trip through ZINT representation. As is, the "subtraction by hand" is moderately bulky but otherwise uninteresting - I keep the current carry and the virtual stack index it comes from on the data stack until the carry is \(0\).

Beyond the carry propagation loop I change an offset stored in a local variable. I'm punting the responsibility for adding it onto the middle digits of the first palindrome down to AlgIIfinish and AlgIVfinish - that's more efficient than intercepting the palindromes returned by the recursion into pickType and performing surgery on one of them.
This offset is initialized in fromZINT as a neutral \(0\), but I have to make sure two nested AlgV invocations can cooperate, so I increment it by \(1\) instead of blindly setting it to some constant value.

Finally, I have to handle the rare case of carry propagation reaching the most significant digit and bringing it down to \(0\). As mentioned in pickType and AlgMain, I handle it by leaving this leading \(0\) in the virtual stack and ensuring that LAM3 is \(2\) when it happens (and \(1\) otherwise). A necessary precondition is that the input number was type A5 or A6 which (along with type B) get LAM3 set to \(2\) in the middle of pickType, so all I have to do is reset it to \(1\) for all the other type A5+A6 and B numbers. The length of the input number stays untouched, AlgMain relies on that to work properly, as noted there.

pickType expects a meta containing the input number's digits as parameter on the data stack, but all it does with it is to NDROP it all. I'm lazy and pass the retrying pickType an empty meta instead, represented by the recycled \(0\) carry left over from the manual carry propagation loop.

Code:

LABEL AlgV.s

::

(AlgV checks)

  DUP#1+ GetElemTopVStack

  OVER GetElemTopVStack

  2#0=OR NOT?SEMI

  4GETLAM #6- #0=?SEMI

(AlgV)

  4UNROLL3DROP

(subtract from less significant)

(middle digit)

  DUP GetElemTopVStack

  DUP#0=ITE

  :: DROP BINT2 1GETLAM #1- ;

  #1-1SWAP

  3PICK PutElemTopVStack

(propagate carry, which is)

(1 or 2 in the first iteration)

(to also subtract from more)

(significant middle digit)

  BEGIN

    SWAP#1+ SWAPOVER

    GetElemTopVStack SWAP#-

    1GETLAM OVER#< ITE

    :: 1GETLAM #+ ONESWAP ;

    ZEROSWAP

    3PICK PutElemTopVStack

  #0=UNTIL

(increase stored offset and)

(try again, applying it)

(is done at the end of)

(AlgIIfinish and AlgIVfinish)

  6GETLAM #1+ 6PUTLAM

  SWAPDROPDUP GetElemBotVStack

  #0=?SKIP :: BINT1 3PUTLAM ;

  2RDROP pickType

;

@

It's almost time for the algorithm finishers ... but for them we need to see a pair of helper functions. Their contents aren't hard to understand, but I'll take this opportunity to point out: until the very end of the finishers, the palindromes are represented as a series of separate digits on the stack - half a palindrome's worth for each, since maintaining the duplicates making up other half across all changes would be quite burdensome. The knowledge whether a palindrome should have one or two middle digits is implicit up to this point, but after the finishers are done and all the code paths converge again, it's lost - I have to apply it to the palindromes in time, which I do by picking which helper gets called.

Code:

LABEL makeEvenLenPal.s

::

  dup_

  reversym top& P{}N

;

@

Code:

LABEL makeOddLenPal.s

::

  dup_ SWAPDROP#1-_

  reversym top& P{}N

;

@

Time for a finisher - this is the one for Algorithm I, the simplest one. All four finishers have to take the digit groups generated by the loop in AlgMain as well as its leftover "working variables" and package them into proper palindromes.
Wrapping them in lists at the end is straightforward with the two helper functions shown above (though have a look at how I can temporarily park already-finished palindromes on the return stack - UNROLLing them out of the way instead would have been less efficient due to the variable meta length). Strictly speaking, the digits weren't stored as a proper meta because their length field was missing; that's fixed as part of packaging because the length of each palindrome can be derived from the number of iterations done, which has been stored into LAM7 by AlgMain. Keeping a count up-to-date throughout the main loop would have been much more work (ROLL each of them down, increment, and UNROLL back), and that's not even counting how much they would have gotten in the way of the other operations in the loop.
The more interesting part of the finishers is filling in any missing digits (Algorithm I has just one, \(x_{m+1}=0\)), and the correction step with its nested case structure. Generating the missing digits is weaved into the correction step's code because as central digits in the palindromes they are a prime target for corrections. That also means UNROLLing them into their proper places on the stack occurs after all the nested cases.

Algorithm I has just three cases. Due to a major issue in version 2 of the proof (an optimization on the authors' side works for 7-digit and longer numbers, but it breaks the assertion that numbers with only 5 digits starting with any digit other than \(1\) can use Algorithm I without changes), I'm falling back to version 1. That means case I.3 has two sub-cases. If you're interested in the mathematical background of this issue, it's a consequence of 5-digit numbers using only step 2, with no iterations of the proof's looped step. In my AlgMain implementation that means a single iteration with a "tentative \(x_i\)" that may not be \(1\) like in subsequent iterations. The last "tentative \(x_i\)" is what decides which of the sub-cases of I.3 is needed; values other than \(1\) lead to the I.3.ii case that got optimized away.

The reasoning for version 2's optimization is nonsense, too. It talks about \(\delta_{m-1}=0\) not being allowed, which appears to be erroneously copied from Algorithm II where that is a result of Algorithm V adjusting the input number to make it so. However, it still (mostly) holds: \(y_m=g-1\) and \(x_m=1\) can't occur together because \(y_m=g-1\) would imply that \(\delta_m=z_{m-1}\), which clearly leads to the second branch of the definition of \(x_m\), i.e. \(0\). Unless \(m=2\), that is - then \(x_m\) can have higher values due to the "tentative \(x_2\)" not necessarily being \(1\). That's the core of the 5-digit logic hole made obvious - \(m=2\) corresponds to a 5-digit number.

Annoyingly, I left a little optimization potential behind. As irritated as I am about that, it also presents an opportunity to give you some insight into my methods for arriving at optimized code. They may not be the best for you, but they work for me.

First of all, this part of the code has been completely rewritten when the 5-digit issue was discovered. I still have some memories of the previous version, and an important piece was that cases I.2 and I.3 (in its buggy form) used the same value for the filled-in digit, i.e. \(x_{m+1}=1\). That led to a nesting structure that separated I.1 from the other two cases first, set the value of \(x_{m+1}\) accordingly, and then separating I.2 and I.3.
With I.3.i from version 1 on the table, that's no longer an appropriate branching structure, because it has \(x_{m+1}\=0) like I.1. Invalidating such a deeply baked-in decision is a reason for a rewrite - I'm not afraid of rewrites, attempting to fix already optimized code instead (to work with a shifted foundation) often leads to a mess - the perceived benefit of continuity merely keeps hard-to-read (previously) optimized code around and buries it in more layers of hard-to-read code, and usually it won't even be fully optimized anymore because the new layer has to work around the kinks of the old code. If you're concerned about maintenance, embrace rewriting it to only have that one layer of optimized code, or don't optimize this aggressively in the first place.

My next attempt lumped I.1 with I.2, and separated I.3 first. I decided to deviate from the proof's instructions and initialized \(x_{m+1}\) to \(1\), then overwrite it with \(0\) in cases I.1 and I.3.i. That's reversed from the proof; additional options were: delaying initialization such that the cases push a new value to the stack instead of replacing one, or pushing both values to the stack and utilizing SysRPL combo commands to DROP one or the other. I counted commands for all four options and found this to be the best one. The more thoroughly you explore your options, the better will you be able to find the optimal one.

However, afterwards I spotted something else: the differences between both sub-cases of I.3 are the new values for \(x_{m+1}\) and \(z_m\), and taken together as a 2-digit number these always have the value \(z_m+1\), i.e. I.3.ii is a kind of "error" case where the increment I.3.i prescribes for \(z_m\) goes out of range. That triggered a rewrite of that branch, such that I don't even explicitly distinguish between the sub-cases anymore, and just obtain the new values from 1GETLAM #/ (quotient into \(x_{m+1}\), remainder into \(z_m\)). That requires unconditionally overwriting \(x_{m+1}\), and I failed to realize that this impacts my previous choice for this digit - only I.1 and I.2 are interested in that choice now, which means pushing both values initially and discarding one or the other is one command shorter than the code below.
With that revised choice I.3 needs to discard both, i.e. one more than it currently does, but that merely changes the 2DROP into 3DROP. The code for the I.1 and I.2 branch doesn't change length either, #0= ?SWAPDROP is still two commands inside that secondary. But getting the two values in the appropriate places on the stack is shorter by one command because I recycle the "working variables" from AlgMain's loop: \(c_m\) needs to be brought from level 3 to level 1 because it's used for the decisions, and the other two are always \(0\) (the dummy value where type B numbers would keep a copy of \(x_{i-1}\) for use in the next iteration; Algorithm I deals with type A numbers instead) and \(1\) (the next "tentative \(x_i\)"). SWAPDROPSWAP therefore needs to be replaced with ROT, which is free because combines with the DUP of the \(c_m=2\) check into ROTDUP.

This is a situation I caused myself by optimizing the handling of \(x_{m+1}\) before the I.3 branch had its formulae set in stone, essentially a case of the "premature optimization" boogeyman (if you ignore that changing the formulae was itself an act of optimization). I subscribe to the belief that optimization can't be premature - after all, choosing the right algorithm for a job is "optimization" as well, and you do that very early on. If your assignment is to sort values (and you are not allowed to use library functions for some reason), you'll look for sorting algorithms ... Bubblesort? Nah, it's terrible for anything beyond basic computer science education with blackboard and chalk. Quicksort? Maybe. Mergesort? Also maybe. Perhaps even pluggable algorithms, using some kind of callback. The point is that you necessarily pick an option before you write the code for it. If you get it right immediately, you don't need to throw away code you've spent time and effort to write, and you don't need to recall or reconstruct mental context for additions and replacement.
And don't get me started on optimizing for developer time, which is definitly an optimization target too. The "we can optimize it later" mindset that's especially prevalent in modern scripting languages' communities but also present in compiled ones leads exactly to this repeated revisiting. Each time you think "oh, what was this for again, let me read the docs I wrote back then, ahh crap they don't exist", you're using up the time you gained from your earlier laziness, and then some. Get it right on the first try (or maybe second), and you'll likely save time overall.

Well, that turned into a rant - apologies for that. Back to the SysRPL code. The only thing left to point out is that type A numbers have a dummy \(0\) beyond the palindromes. That gets kicked with the SWAPDROP near the end of the packaging section.

Code:

LABEL AlgIfinish.s

::

  SWAPDROPSWAP

  DUP #2= ITE

  ::

(I.3)

    2DROP

    7GETLAM DUP#2+PICK_

    #1-SWAP #1+UNPICK_

    #1+ 1GETLAM #/

  ;

  ::

(I.2)

    #0=?SEMI

(I.1)

    DROPZERO

  ;

  7GETLAM #2* #1+UNROLL

  7GETLAM INCLUDE makeOddLenPal

  OBJ>R_

  7GETLAM INCLUDE makeEvenLenPal

  OBJ>R_

  7GETLAM #1+ makeOddLenPal

  SWAPDROP R>OBJ_ R>OBJ_

;

@

Algorithm II has much more cases and sub-cases in its correction step than Algorithm I. It also defines two additional digits instead of just one - and in a small oversight, the proof skips the explicit declaration of \(c_m\). Part of these traits happen to be shared with Algorithm III, especially the matter of calculating another carry and remainder (which goes into \(y_m\) for both). I spotted this code sharing potential by thinking "wait, didn't I just write something very similar?" That's clearly not a universally applicable method for optimization, but it shows that a good memory and/or very thorough examination of the algorithm you're implementing can occasionally turn up code sharing opportunities in places you didn't fully expect.

Again, there's optimization potential left. It's even easier to spot than in Algorithm I, if you remember that the middle one out of AlgMain's "working variables" will have a value of \(0\) for type A input numbers: replace 2DROP ZEROSWAP with DROPSWAP. I must've been half asleep when I wrote this garbage - no lessons here other than: don't do that, get your sleep! Big Grin

Inside the already deeply nested case II.2.ii.c, the addendum for 6-digit numbers is embedded with even more impressive indents. The S6.not1 prefix in my case-referencing comments make it easy to find. It consists of an unnamed general case (\(x_2\ne 0\)) and four named ones, though out of these iv is merely theoretical. Furthermore, i is a special case of iii, so I handle them together - the conditions \(x_2=0\) and \(y_1=g-1\) suffice.
They are also the only ones to deviate from the palindrome lengths normally prescribed by Algorithm II. Since all of the levels of nesting are entered via the "case" command, they don't leave a return stack level, so I can skip the normal packaging for Algorithm II with a simple RDROP (combined with DROP here) and proceed with packaging in the style of the "short / small" numbers, i.e. with fixed-length palindromes constructed explicitly in full length (which is more efficient than the reversym-based helpers at these lengths). Even the dummy \(0\) found beyond \(x_1\) on the stack gets recycled in the process.

Allowing 6-digit numbers into the algorithm principally made for 8-digit and longer numbers doesn't lead to issues like in Algorithm I, even though Algorithm II also skips the looped step for 6-digit numbers and therefore is similarly susceptible to the last "tentative \(x_i\)" being something other than \(1\). Changes between version 1 and version 2 suggest the authors realized the issue this time - but only due to types A5 and A6 on 7-digit numbers being so similar to 6-digit numbers, as the new text in II.2.ii.c reveals. Interestingly, they didn't ban Algorithm V (which can shorten them into real 6-digit numbers of type A1 or A2!) or mandate an altered version of Algorithm II in that situation, which makes me question whether those addenda are even necessary anymore. I guess preventing Algorithm V from retrying with a 6-digit number starting with \(1\) is needed after all, because that would be type B, and Algorithm IV probably doesn't like numbers of that length, but since they are so rare, can they be intercepted more efficiently somehow, and Algorithm V let loose on the rest of 6-digit numbers? And can the replacement for II.2.ii.c be reduced or eliminated?
The resulting palindromes would be different, so I didn't investigate further, but it's especially telling that the changes between versions eliminate the old II.2.ii.b and split off a new one from II.2.ii.c - the case affected by both addenda for 6-digit numbers. The additional condition \(z_{m-1}=0\) for what's left under the II.2.ii.c label even makes the phrasing in the 6-digit addenda outdated: "[...] we have to account also for the possibility \(z_2=0\) in the Step II.2.ii.c. This is the temporary configuration in Step II.2.ii.c (\(c_2=0\), \(y_3=y_2=0)\) with \(z_2=0\). [...]" I'll leave that there as food for thought.

With the 6-digit-specific addendum done, there isn't much interesting code left here. The addendum is activated based on the original number length, so it's activated on the same numbers as the ban on Algorithm V. Specifically, this excludes those which turned into 6-digit numbers under the influence of Algorithm V, just in case one of them hits 6-digit special cases i or iii which don't follow the normal packaging code. It's also a plausible interpretation based on the proof's hard split between long numbers and almost-an-afterthough short / small ones - in its conclusions it avoids forward references like the plague, so relying on Algorithm V-caused 6-digit numbers always getting a 6-digit palindrome seems a bit of a stretch when the 6-digit cases are listed many pages later.
Speaking of Algorithm V, concealed in that normal packaging code there are the commands 6GETLAM #+ which apply the Algorithm V adjustment to the first palindrome's middle digit pair.

Code:

LABEL AlgIIfinish.s

::

  2DROP ZEROSWAP

  INCLUDE AlgIIandIIIcarry

(II.1)

  DUP#1= ITE_DROP

  ::

    #0=case

    ::

(II.2)

      SWAP#1+SWAP

(II.2.i)

      DUP#0= NOTcase #1-

(II.2.ii)

      DROP 7GETLAM #2+ROLL

      DUP#0=csedrp

      ::

        OVER #0=case

        ::

          4GETLAM #6- #0=case

          ::

(S6.not1)

            5ROLL DUP#0=csedrp

            ::

              SWAPDROP 3PICK#+

              1GETLAM #=case

              ::

(S6.not1.i and S6.not1.iii)

                DROPRDROP

                UNROT #1+SWAP

                ZEROZEROZERO

                5PICK BINT6 P{}N

                ONEONE TWO{}N

                ROT #3- ONE{}N

              ;

(S6.not1.ii)

              ROT #1-UNROT

              1GETLAM #1-UNROT

              SWAP#1+SWAP

              ZEROSWAP

              BINT1 BINT0

              1GETLAM #2-

            ;

            #1- 5UNROLL

            DUP4UNROLL

            1GETLAM #1-SWAP

          ;

(II.2.ii.c)

          BINT2 ROTDROPSWAP

          1GETLAM #1- 4UNROLL

          6ROLL #1- 6UNROLL

          1GETLAM #4-

        ;

(II.2.ii.b)

        SWAP#1-SWAP

        7GETLAM #1+UNROLL

        #ZERO#ONE

      ;

(II.2.ii.a)

      #1- 7GETLAM #2+UNROLL

      SWAP#1+SWAP

      1GETLAM #2-

    ;

(II.3)

    DROPSWAP #1-

    4ROLL #1- 4UNROLL

    ONESWAP

  ;

  7GETLAM #2+UNROLL

  7GETLAM #2* #2+UNROLL

  7GETLAM makeEvenLenPal OBJ>R_

  7GETLAM #1+ makeOddLenPal

  OBJ>R_ 6GETLAM #+

  7GETLAM #1+ makeEvenLenPal

  SWAPDROP R>OBJ_ R>OBJ_

;

@

Code:

LABEL AlgIIandIIIcarry.s

::

  3PICK computeCarryAndRem

  :: 7GETLAM GetElemTopVStack ;

  SWAP

;

@

As an algorithm for type B numbers Algorithm III actually uses the \(x_{i-1}\) "working variable" (unlike Algorithm I and Algorithm II). That is quite useful, because the formula for the extra digit \(y_m\) and the carry \(c_m\) needs it. Thus it goes into the parameter for the preceding snippet shared with Algorithm II. It's also altered in III.3.iii and III.3.iv, but attempting to keep it around (to save the commands for fetching the copy out of the first palindrome's digit group) is not worth the commands spent on DROPping it in the other cases and working around the additional occupied stack level.

Type B also means no more dummy \(0\) behind the palindromes. The \(1\) taking its place gets incorporated into the longest palindrome as \(x_0\).

In the four sub-cases of III.3 we have the unusual situation of two conditions for distinguishing them which independently control separate pieces of the adjustments: \(y_{m-1}=0\) pairs III.3.i with III.3.ii and III.3.iii with III.3.iv in adjusting \(y_{m-1}\) itself and \(x_{m-1}\); separately \(z_{m-1}\) pairs III.3.i with III.3.iii and III.3.ii with III.3.iv in adjusting \(z_{m-1}\) itself and both of the digits filled in after the loop, i.e. \(x_m\) and \(y_m\). Consequently I implemented them as two sequential ITE checks. Because I flipped the initial value of \(x_m\) compared to the proof, the latter gets downgraded into an IT, omitting the else-branch to save a command (and the code to handle III.1 and III.2 gets more or less swapped, but that's a neutral change as far as my optimization targets are concerned).
Thinking of these four sub-cases as a 2x2 grid may help - there are digit adjustments common across a row, and other digit adjustments common across a column.

Code:

LABEL AlgIIIfinish.s

::

  UNROT #+

  AlgIIandIIIcarry

(III.2)

  DUP#0= ITE_DROP

  ::

(III.1)

    #1=case SWAP#1-SWAP

(III.3)

    ROT#1+

    1GETLAM OVER#= ITE

(III.3.ii and III.3.iv)

    DROPZERO

    ::

(III.3.i and III.3.iii)

      SWAP#1- SWAPROT

      #1-UNROT

    ;

    UNROT

    7GETLAM #3+ ROLL

    DUP#0=IT

    ::

(III.3.iii AND III.3.iv)

      DROP

      7GETLAM #2* #1+DUP

      #2+ROLL #1-SWAP #1+UNROLL

      1GETLAM

    ;

    #1-

    7GETLAM #3+ UNROLL

  ;

  7GETLAM #2+UNROLL

  7GETLAM #2* #2+UNROLL

  7GETLAM makeEvenLenPal OBJ>R_

  7GETLAM #1+ makeOddLenPal

  OBJ>R_

  7GETLAM #2+ makeOddLenPal

  R>OBJ_ R>OBJ_

;

@

Almost through - here comes the finisher for Algorithm IV. The proof fills in values \(x_{m-1}\), \(y_{m-1}\), and \(z_{m-1}\). Like the preceding two algorithms it also uses \(c_{m-1}\) without defining it, but the formula for a run-of-the-mill carry isn't hard to guess. With that little pothole patched, these four values amount to an additional iteration of the main loop, which AlgMain has already taken care of.

The correction step consists of many more cases than the other three main algorithms, and its first layer of dispatching uses not just \(c_{m-1}\) but also \(x_{m-1}\). The proof even phrases it as dispatching on the sum of these two values, but then it proceeds to look into the summands in order to tell apart IV.4 and IV.5, so I broke it down into two layers dispatching on \(x_{m-1}\) with an ITE, and then inside its two branches "case" variants checking the three possibilites for \(x_{m-1}\). This duplicates case IV.1 because it's now split over both branches of the ITE with one case on each side, but that's no big deal since it's the do-nothing case. That leaves two cases per ITE side, with IV.2+IV.3 and IV.4 on the \(x_{m-1}=0\) side, as well as IV.5 and IV.6 on the other.
Conveniently, \(c_{m-1}\) and \(x_{m-1}\) are present on the stack in the form of AlgMain's "working variables". The third of those (the constant \(1\)) is DROPped immediately.
Since each branch knows the value of \(x_{m-1}\) thanks to dispatching on it, my adjustments to that value don't involve ROLLing it down to level 1, subjecting it to addition or subtraction, and UNROLLing it back, as is done for most adjusted variables across all four main algorithms' adjustment steps. Instead I just UNPICK a constant onto the correct stack level.

In IV.2 (let's postpone its companion IV.3 for a moment) the proof's cases separate out an easy case named IV.2.i with the condition \(z_{m-1}\ne 0\) first. The rest gets split into two cases IV.2.ii and IV.2.iii on \(y_{m-2}\ne 0\), and both of those have three sub-cases with quite similar figures. The conditions are slightly different, but that's fixable: IV.2.ii.c says that \(z_{m-2}\) can't be anything other than \(g-1\), so I can restructure the conditions such that IV.2.ii.a keeps only \(z_{m-2}\ne g-1\) and IV.2.ii.c gains \(z_{m-2}=g-1\) in addition to the \(y_{m-1}=1\) it already had. That aligns the sub-case conditions between IV.2.ii and IV.2.iii, and the figures inside are also eerily similar - the only difference between them are how \(y_{m-2}\) and \(x_{m-2}\) are affected. That means they all make a 2x3 grid of cases, so I can separate the choice between IV.2.ii and IV.2.iii from the sub-case choice like I did in Algorithm III. One part of this is even moved to the separate file AlgIVfinish2and4ii, because the adjustments that are different between IV.2.ii and IV.2.iii and common across their respective sub-cases are the same as the adjustments that are different between IV.4.ii.a and IV.4.ii.b, including the condition to decide between them.
Remodeling the conditions of IV.2.ii's sub-cases also flips the suggested nesting order of checks: \(z_{m-2}=g-1\) first, then \(y_{m-1}=1\). I remodeled this case's conditions instead of the other one's because this order lets me share the code for common adjustments to \(z_{m-2}\) and \(z_{m-1}\) in sub-cases .b and .c.

IV.3 has a condition which suggests splitting it away from IV.2 early, even before IV.2.i. However, the conclusions reached in IV.3, notably \(z_{m-1}=0\), \(y_{m-2}\ne 0\), and \(z_{m-2}\ne g-1\), let me trace a path through IV.2's sub-cases into IV.2.ii.a. Indeed, a close look shows that IV.3 is a special case of IV.2.ii.a with a distractingly complicated description, so I don't need extra code for it at all.

IV.4 has an easy case in IV.4.i to be separated early and a somewhat more complex IV.4.ii which has enough similiarity with IV.2 that the code written for the differences between IV.2.ii and IV.2.iii also works for IV.4.ii.a and IV.4.ii.b. As already mentioned, that part is shared via a separate library command - the two places which need to share this snippet are nested too deeply to receive any advantages from attempting to share via runstream manipulation like in pickType.
Finally, there is IV.4.iii which has some superficial similarity with the sub-cases of IV.2.ii, but in the details it's too different to share code with those. In the end the amount of code sharing I was able to achieve was limited to common pieces of the unnamed sub-cases in IV.4.iii.a and similarly the common pieces of the unnamed sub-cases in IV.4.iii.b. The rest of the adjustments had to be done with a separate set of commands.

On the \(x_{m-1}=1\) side, IV.6 is thankfully short and simple, with no sub-cases. IV.5 compensates for that, hitting the proof's reader with another large set of them. They aren't so bad though, because they break down into just three groups.
IV.5.i and IV.5.ii clearly belong together with a shared \(z_{m-1}\ne g-1\) condition and similar adjustments to the palindrome digits. The latter is nothing more than an "error case" for the former, preventing an out-of-range \(y_{m-1}\) with a simple extra adjustment.
IV.5.iii.a has to fend for itself. Oh well, it's just a single case.
The remaining nine cases can be arranged into a 3x3 grid. It can even be reduced to a 2x3 grid by spotting that the difference between sub-cases .b and .c (or .c and .d instead for IV.5.iii) amounts to a single digit \(y_{m-1}\) set to a different value, and the condition to select between them checks if that digit has a certain value - so they can be merged with a formula in place of constants for that digit. Then, it can be reduced further to a 2x2 grid by merging IV.5.iii (or what's left of it after IV.5.iii.a has been thrown out) with IV.5.iv: the differences are the adjustments for \(y_{m-2}\) and \(z_{m-2}\), which both are a constant in one and variable in the other, but transferring the variable formula to the constant case achieves the correct result.

There may be further optimizations possible, but they are not as trivial as those I showed in the examination of the other algorithms' correction steps. Ideas I have include pulling a bunch of digits touched by many of the cases out of the digit groups down to fixed spots on the stack ahead of time (deduplicating all that 7GETLAM stuff), and merging \(x_{m-1}\) with \(y_{m-1}\) into a two-digit number for the duration of the correction step (they seem to be frequently adjusted as if they were just that). For the other algorithms, neither appears useful (apart from I.3 which essentially does have the two-digit number trick already), but here the wide decision tree of cases may push them over the edge.

Code:

LABEL AlgIVfinish.s

::

  DROP #0=ITE

  ::

(IV.1)

    DUP#1= caseDROP

    #0=case

    ::

(IV.2 and IV.3)

      DUP#0=case

      ::

        INCLUDE AlgIVfinish2and4ii

        SWAP#1+

        1GETLAM OVER#= ITE

        ::

(IV.2.ii.b,IV.2.ii.c,)

(IV.2.iii.b,and IV.2.iii.c)

          DROP BINT3 BINT1

          7GETLAM #2+ROLL

          DUP#1= ITE

(IV.2.ii.b and IV.2.iii.b)

          :: 1GETLAM #+ ;

(IV.2.ii.c and IV.2.iii.c)

          SWAP#1+SWAP

          #2- 7GETLAM #2+UNROLL

        ;

        ::

(IV.2.ii.a,IV.2.iii.a,and IV.3)

          SWAPDROP

          7GETLAM ROLL #1-

          7GETLAM UNROLL

          ONEONE

        ;

        7GETLAM #2* #1+UNPICK_

      ;

(IV.2.i)

      #1-

      7GETLAM #1+ROLL #1+

      7GETLAM #1+UNROLL

    ;

(IV.4)

    #1+ 7GETLAM #1+ROLL

    1GETLAM 3PICK #<>case

(IV.4.i)

    :: #1- 7GETLAM #1+UNROLL ;

    ROT#1+ ROT OVER#=case

    ::

(IV.4.iii)

      #2- SWAP2DUP #>case

      ::

(IV.4.iii.a)

        #2+ 7GETLAM UNROLL

        DUPDUP

        7GETLAM #2* #1+UNPICK_

        7GETLAM #2+ROLL

        #1+DUP 1GETLAM #=ITE

        DROPZERO

        ::

          7GETLAM #2* #1+DUP

          #2+ROLL #1-SWAP

          #1+UNROLL

        ;

        7GETLAM #2+UNROLL

      ;

(IV.4.iii.b)

      #3- 7GETLAM UNROLL

      DROPZERO BINT2

      7GETLAM #2* UNPICK_

      7GETLAM #1+ROLL DUP#0=IT

      ::

        DROP

        7GETLAM #2* DUP#1+PICK

        #1-SWAP UNPICK_

        1GETLAM

      ;

      #1- 7GETLAM #1+UNROLL

      BINT3

    ;

(IV.4.ii)

    SWAP #2- 7GETLAM UNROLL

    ONEONE 7GETLAM #2* #1+UNPICK_

    AlgIVfinish2and4ii

  ;

  ::

(IV.1)

    DUP#0=csDROP

    #1=case

    ::

(IV.5)

      #1+ 1GETLAM OVER#=case

      ::

(IV.5.iii,IV.5.iv,and IV.5.v)

        7GETLAM #2+ROLL #1+SWAP

        OVER#=

        3PICK #0= ORNOT case

        ::

(IV.5.iii.a)

          7GETLAM #1+UNROLL

          7GETLAM ROLL #1+

          7GETLAM UNROLL

          BINT0

          7GETLAM #2* UNPICK_

          #1- 1GETLAM #2-

        ;

        DUP#1= IT

        ::

(IV.5.v)

          1GETLAM #+

          7GETLAM #2* #1+DUP

          #1+ROLL #1-SWAP UNROLL

        ;

        #2- 7GETLAM #1+UNROLL

        7GETLAM ROLL DUP #>1 ITE

        ::

(IV.5.iii.b,IV.5.iv.a,)

(and IV.5.v.a)

          BINT2

          7GETLAM #2* UNPICK_

        ;

(IV.5.iii.c,IV.5.iii.d,)

(IV.5.iv.b,IV.5.iv.c,)

(IV.5.v.b,and IV.5.v.c)

        :: 1GETLAM #+ ;

        #2- 7GETLAM UNROLL

        #1+ BINT1

      ;

(IV.5.i and IV.5.ii)

      7GETLAM #1+ROLL

      DUP#0=IT

      ::

(IV.5.ii)

        7GETLAM #2* UNPICK_

        1GETLAM

      ;

      #1- 7GETLAM #1+UNROLL

    ;

(IV.6)

    7GETLAM #1+ROLL #1-

    7GETLAM #1+UNROLL

    DROPZERO

  ;

  7GETLAM makeOddLenPal OBJ>R_

  7GETLAM makeEvenLenPal OBJ>R_

  6GETLAM #+

  7GETLAM #1+ makeEvenLenPal

  R>OBJ_ R>OBJ_

;

@

Code:

LABEL AlgIVfinish2and4ii.s

::

(IV.2 and IV.4.ii)

  7GETLAM #2+ROLL DUP#0=IT

  ::

(IV.2.iii and IV.4.ii.b)

    DROP

    7GETLAM #2* #1+DUP

    #1+ROLL #1-SWAP UNROLL

    1GETLAM

  ;

  #1- 7GETLAM #2+UNROLL

;

@

That was the final source file. Phew! Any questions?

Gerald H · 05-26-2023, 08:52 AM

Now very interested in your library compiler: library 859 has size

4110.5

but after decompiling & recompiling using native 50g tools size is

9647.5

with no (noticeable?) change to time of operations.

Very grateful for expansive explanation of difficulties, trust elaborating to your students clarifies points for yourself - the exegesis in itself requires more effort than I can muster.

3298 · 05-26-2023, 12:13 PM

(05-26-2023 08:52 AM)Gerald H Wrote: Now very interested in your library compiler: library 859 has size

4110.5

but after decompiling & recompiling using native 50g tools size is

9647.5

with no (noticeable?) change to time of operations.

That's the library splitter not handling library commands nested in one another very well - because it's pretty much impossible. It's seeing that e.g. pickType is in the library's link table and extracts it to a file, but the copy embedded in fromZINT stays there. Then AlgMain is embedded in both copies of pickType, but it's also in the link table, so you get a third copy in a separate file ... that snowballs quickly.
If you were to replace all the embedded copies with ROMPTRs (or IDs which CRLIB can turn into ROMPTRs) referencing the non-embedded one, you'd likely trim it back down to ... around 4200 bytes, maybe? The library splitter's (forced) shortcomings make my tool look much better than it really is.

(05-26-2023 08:52 AM)Gerald H Wrote: Very grateful for expansive explanation of difficulties, trust elaborating to your students clarifies points for yourself - the exegesis in itself requires more effort than I can muster.

As the kids would say: ROFL!
But you have a point, my complex writing style isn't known for being easy to understand. Sad

Too long sentences, too deeply nested subordinate clauses, etc.

Gerald H · 05-27-2023, 06:25 AM

Suggestions for replacing Zinttodigits & Digitstozint, smaller & faster(? just):

Code:

Z2D

Size: 46.

CkSum: # D740h

::

  FPTR2 ^Z>S

  DUPLEN$

  DUP1LAMBIND

  ZERO_DO

  DUP

  CAR$

  CHR>#

  BINT48

  #-

  UNCOERCE

  SWAP

  CDR$

  LOOP

  DROP

  1GETABND

;

D2Z

Size: 28.5

CkSum: # F9A4h

::

  1GETLAM

  NULL$SWAP

  ZERO_DO

  SWAP

  #>$

  &$

  LOOP

  FPTR2 ^S>Z

;

Gerald H · (This post was last modified: 05-27-2023 08:25 AM by Gerald H.)

Please ignore this suggestion: Only works if last digit of integer same as number of digits in integer! Product of testing for input 1234567.

Smaller version of Z2D:

Code:

Z2D

Size: 31.

CkSum: # E212h

::

  FPTR2 ^Z>S

  DUPLEN$

  #1+_ONE_DO

  DUPINDEX@

  SUB$1#

  BINT48

  #-SWAP

  LOOP

  DROPDUP

;

3298 · 05-27-2023, 04:39 PM

String-based conversion, mhm ... pray forgive my doubts, but have you checked if that works in bases other than decimal? Because I haven't seen a knob to turn which makes builtin number-to-string and string-to-number commands work in e.g. base 101.
And I'd like to know how to unambiguously represent any base 9999 digit with one 8-bit character. Big Grin

That is to say, for the purpose I wrote them for, I believe my versions still have an advantage. I would be glad to be proven wrong.

Gerald H · 05-27-2023, 04:42 PM

Sorry, unable to prove you wrong, suggestion withdrawn.