Threaded Mode | Linear Mode

3298 · 05-25-2023, 11:21 PM

After the complexity of AlgS6, following the path of longer numbers is almost refreshing. Almost...
pickType cleans the stack, then it's supposed to dispatch the various types (A1 to A6, B1 to B7) and call AlgMain with a fixed small number of parameters.
A bit of complexity creeps back in with the way I chose to implement the part of dispatching which depends on \(\delta_0\), because I do that with a callback passed via runstream to AlgMain. The reason is that AlgMain needs the carry \(c_1\) anyway, which it can get from the helper function computeCarryAndRem I already explained and used above. That incidentally yields \(z_1\) as well, and if you look close enough, this part of the dispatching work effectively operates on \(z_1\), as a kind of "error handling": when one type's choice of digits would result in invalid \(z_1=0\), it gets fixed by the other type(s) in a group jumping in.
The groups are mostly pairs: A1 with A2, A3 with A4, A5 with A6, B1 with B2, and B6 with B7. The remaining trio form the only larger group: B5 with B3 and B4.
In each of these groups you'll notice that the conditions for every type in it are similar and generally combine nicely (with the notable exceptions of A5+A6 and B1+B2), and the initial palindrome digits are also very similar, all of which makes my job easier.

The group with the simplest condition is A1+A2 with \(\delta_{l-2}>2\). You'll notice two things: a BINT0 pushed before the comparison (the values pushed in this group start with a \(0\), and when the condition doesn't fit, the next comparison uses a BINT0 as parameter for its first command as well), and 3GETLAM used as index into the virtual stack. Its value is initially \(1\), corresponding to the proof's \(l-2\) index. This is necessary here because AlgV can bring us back here with a number shortened by its subtraction (rolling over from types A5+A6 to A1+A2); in that case, 3GETLAM will recall a stored \(2\) instead to skip past a leading \(0\) digit lingering in the virtual stack. When the proof directs you to use Algorithm V and the subtraction found in there shortens the input number, you'd have to redo your indices. During implementation of AlgV however, I found that I could just leave the leading \(0\) digit in the virtual stack and advance this index instead.

The parameters this group hands to AlgMain are: a \(0\) identifying this as type A (type B is represented by a \(1\)), \(x_1\), \(y_1\), and as callback BINT1. This callback is called when AlgMain computes \(z_1\) as \(0\), and its return value is both used as replacement \(z_1\) and subtracted (modulo \(g\)) from \(y_1\). BINT1 is therefore the simplest possibility for this callback, and the fitting one for type A2.
The next group to be split off is A3+A4, with the condition \(\delta_{l-1}\ne 1\). There's no need to handle AlgV weirdness anymore (its subtraction can only result in a single leading \(0\); the following digits will be \(g-1\), which with \(g\ge 5\) will always be type A1+A2), and its values are also straightforward with the explanations above. The only funny business is how the code is ordered with A3+A4 at the end instead of an embedded secondary right after the comparison, owed to how I can't invert #1=case without splitting this single command into two commands: either #1<> case or BINT1 #<>case. Optimization demands I use the single command.

That leaves the three type B groups (B1+B2, B6+B7, B5+B3+B4) as well as A5+A6. The conditions for these don't line up as nicely anymore; some of them even overlap. I even put a comment into the code to point it out (albeit with a typo, B4 overlaps with B6 instead of with B7).
This is the other place after the pair-of-5-digit-palindromes template in AlgS6 where the proof's specifications grant a choice to a strictly conforming implementation, and I put it to good use by shrinking B4 (discarding the half with \(\delta_{l-3}=3\), which makes the conditions for the entire B5+B3+B4 group simpler). Additionally, with B7 being a special case of B2, and their respective partners B6 and B1 being able to be combined in a similar way, I have the opportunity to nest conditions in an advantageous way: if \(\delta_{l-2}=0\), I still have to tell apart A5+A6 from B1+B2 in intricate ways (because the cutoff is \(\delta_{l-3}>2\) or \(\delta_{l-3}>3\) depending on whether \(\delta_{l-3}=\delta_0\); yes, this sort-of duplicates the later calculation of \(z_0\) for A5 or B1, but at least I simplified the formula into a basic equality check), though if not (i.e. \(1\le\delta_{l-2}\le 2\)), the line between B5+B3+B4 and B1+B6+B2+B7 is a smooth \(\delta_{l-3}>2\).
So that's the rest of the branching structure set in stone ... well, not quite, because some runstream / return stack magic lets me share the code for both B1+B2 instances without resorting to a boring subroutine called from two places (or, with the Nosy-like library construction, embedded in one place and called from the other). Look at the comparison which distinguishes B5+B3+B4 from B1+B6+B2+B7: it's a single command with "case" semantics (#>2case, to be precise), followed by the secondary handling B1+B6+B2+B7, and then the B5+B3+B4 code after that. That means not only does the other branch (distinguishing A5+A6 from B1+B2) need this branch to linger on the return stack, it also must pluck out the B1+B6+B2+B7 secondary from that return stack level's runstream and discard the rest.
Keeping the runstream on the return stack is easy, just use something based on IT instead of case. With ITE I also get to advance that runstream past the #>2case if the branch leading to A5+A6 and B1+B2 is taken; if the #>2case consisted of more than a single command, I could have used RSKIP from inside the A5+A6+B1+B2 branch to skip the rest, or turn them into a single object by wrapping them into a secondary (though the latter option would complicate runstream manipulation when that hypothetical secondary gets executed; I'd have to resort to COLAcase inside). When it turns out A5+A6 is the correct path to take, I can just as easily RDROP the return stack level.
The real mind-twister is how to limit that return stack level's runstream to just the B1+B6+B2+B7 secondary when A5+A6 is not taken. The A5+A6 code is discarded by ORcase, that's not exactly black magic, but how does one chop down a return stack level's runstream from somewhere else?

A simple solution would be to use a combination of 'R (which steals the first element out of the first return stack level's runstream and puts it onto the data stack) and RDROP (which exterminates the rest), then do whatever you want with the item on the data stack (e.g. push back onto the return stack with >R, or call it directly with EVAL). Flexible, but not ideal.
An improved variant offers itself if I dig deeper into the effects of runstream- and return-stack-manipulating commands. Any command with "case" semantics (like the ORcase we're working with) acts as COLA or SKIP, depending on whether the condition fits or not. A COLA in turn can be described as executing something as if it was in the parent runstream (i.e. the runstream on the return stack). That means we're looking for something that stands in for the #>2case that got skipped, but without the condition ... okay, it's obvious now, a COLA. That's how the ORcase COLA line came to be.
The take-away here is: analyze what effect a runstream- or return-stack-altering command has on others EVALed by it. You may find an opportunity to engage in further manipulation inside what it EVALs. Describing it as "it's EVALed as if ..." can help understand the state of runstreams.

In A5+A6 and B5+B3+B4 the callback is more complex than a self-pushing constant. When AlgMain calls it, the data stack contains the three values pickType passes to it in levels 4 to 2, and the carry \(c_1\) in level 1. The corresponding \(z_1\) has been determined to be \(0\) at that point and consequently DROPped.
For A6 the callback has to produce \(g-1\) (which happens to be present on level 3 already, in the form of \(x_1\)), and it also needs to adjust \(y_1\) such that subtracting the returned \(g-1\) from it ends up with a \(+1\) over its original value. Phrased like that it's obvious that the callback's compensation has to be adding \(g\) onto it - and this compensation throws off the sum of \(x_1\), \(y_1\), and \(z_1\) by \(g\) as well, so the carry requires adjustment too.
For B3+B4 the callback is responsible for not one but two types. B3 by itself would be another simple BINT1 callback, and B4 has to decrement \(x_1\) by \(1\) and return \(g-2\) (AlgMain will then use that as \(z_1\) and also subtract it from \(y_1=g-1\), resulting in \(1\) as demanded by the proof). The complete callback packages both of these cases with the appropriate \(\delta_{l-3}>1\) condition.

In the condition to choose between A5+A6 and B1+B2, you see a check operating on LAM5. This is an artifact of Algorithm V: when the result of AlgV subtracting from an even-length B1+B2 number looks like an odd-length A5+A6 number, classify it as even-length B1+B2 regardless. LAM5 achieves this by defaulting to \(0\) (i.e. type A), and right at the start of AlgMain it gets set to \(0\) or \(1\) depending on type A or B. AlgV gets asked about its opinion after that, so LAM5 will be \(1\) when AlgV processes a type B number and orders a retry.

One last line to point out: after A1+A2 and A3+A4 are split off, the line BINT2 3PUTLAM ensures that when AlgMain starts, the index stored in that local variable points to the input digit which will line up with \(y_1\). That would be \(\delta_{l-2}\) in A1+A2 and A3+A4 (with the already-mentioned quirk for AlgV), but for the rest it's \(\delta_{l-3}\).

Apart from the goofy return stack manipulation in place of calls to a helper function, this general structure of this type dispatcher ought to translate reasonably well to other languages. If you have an optimizing compiler (a common occurrence in C, C++, and their derivatives, among others), the callback part may be a hindrance as the compiler will have a hard time arguing about what part of the overall program state the callbacks will affect. In that case it might be better to just pull down the computation of \(z_1\) into the various branches of the dispatcher, handle a \(0\) result with non-callback code, and let the compiler's optimization figure out the rest. On the other hand, some languages have features to help the compiler figure out what the callback is allowed to do: Haskell as a pure functional language definitely does, Rust might do too, ...
Even if you don't go that far in adapting my solution, the conditions to tell apart types may be useful without all the other intricacies.

Code:

LABEL pickType.s

::

  NDROP BINT0

  3GETLAM GetElemBotVStack

  #>2case

  ::

(A1 and A2)

    3GETLAM #1- GetElemBotVStack

    3GETLAM GetElemBotVStack #1-

    INCLUDE AlgMain BINT1

  ;

  GetElemBotVStack #1=case

  ::

    BINT2 3PUTLAM

    BINT2 GetElemBotVStack

    DUPONE GetElemBotVStack

    #0=ITE

    ::

(A5,A6,part of B1,and part of B2)

      BINT0 GetElemTopVStack

      OVER#= ITE BINT2 BINT3

      #> 5GETLAM #1=

      ORcase COLA

(A5 and A6)

      RDROP

      BINT0 1GETLAM #1-

      ROT#1+

      AlgMain

      ::

        #1+SWAP 1GETLAM #+SWAP

        3PICK

      ;

    ;

(part of B1,part of B2,B6,and B7)

    #>2case

    ::

(B1,B2,B6,and B7)

(B6 is adjacent to B1)

(and has same initial values)

(B7 is contained in B2)

(and has same initial values)

(half of B4 overlaps with B7)

(as well,but has different)

(initial values)

      ONEONE GetElemBotVStack

      ROT #1-

      AlgMain BINT1

    ;

(B3,B4,and B5)

    DROP ONEONE

    GetElemBotVStack #1-

    1GETLAM #1-

    AlgMain

    ::

      BINT2 GetElemBotVStack #>1

(B3)

      NOTcase BINT1

(B4)

      ROT#1+UNROT OVER#1-_

    ;

  ;

(A3 and A4)

  ZEROZERO GetElemBotVStack #1-

  1GETLAM #1-

  AlgMain BINT1

;

@

You'll likely need some time to understand AlgMain, because not only does it interface with AlgV and the callbacks from pickType, it also implements the meat of all four main algorithms (Algorithm I, Algorithm II, Algorithm III, and Algorithm IV), most importantly the looped step, in a unified manner.

First it stores the \(0\) or \(1\) which is the marker to tell apart types A and B into a local variable, which I named the "index adjustment variable" in fromZINT because this \(0\) or \(1\) will be added to or subtracted from several indices throughout AlgMain. The value stays on the stack though, because for type B it doubles as the extra \(x_0\) present in the first palindrome of that type. For type A the \(0\) is just a harmless dummy value that'll stay there until AlgIfinish or AlgIIfinish reach into these depths again.
The two lines after that calculate the number of loop iterations. The formula for that involved a lot of experimentation and reasoning about what variables my code has access to at this point. Division by \(2\) is a given, same for using the length of the input number as a basis. Using the index to \(x_2\) from \(\delta_{l-1}\) is very much non-trivial, but it ticks all the boxes I need: types A5+A6 are handled as if they were A1+A2 or A3+A4 with an input number shorter by one digit; an AlgV-shortened A1+A2 variant is handled correctly because the pickType quirk of abusing the index to skip past the leading \(0\) is compensated by simply not updating the original input number length in AlgV; and the type B odd-length numbers (Algorithm III) have a smaller-by-1 iteration count compared to their type A peers (Algorithm I). For even-length type B numbers (Algorithm IV) and their type A peers (Algorithm II) this isn't true due to how this lines up with the floor() behavior of integer division, and the proof's phrasing suggests this is wrong, but it's actually perfect: in Algorithm IV, step \(m-1\) is equivalent to an additional iteration of the preceding looped step.

As a byproduct of the division by \(2\), I receive the evenness almost for free in the form of the remainder. The type B quirk means I can't compare it directly against a constant \(1\) or \(0\), but I can compare it against the "index adjustment variable". AlgV is conditionally run based on this, and it will either: return without an effect (if its own supplementary checks say Algorithm V is not applicable); or it will clear the stack, discard the return stack levels pointing to here in AlgMain and to the callback in pickType, perform its adjustments, and retry pickType.
The remainder and type A / type B distinguishing marker are also used to determine which of the algorithms AlgMain shall act as. This isn't important yet, but it's going to be important after the loop, and I don't want to redo the calculation of the remainder, so I just use it here and store the appropriate subroutine in a variable. The order of AlgIVfinish and AlgIIIfinish isn't a mistake, it's another result of the remainder values being switched around for type B.
The iteration count is stored into a variable after the AlgV call because while it's beneficial to ultimately squirrel it away into a local variable for AlgIfinish, AlgIIfinish, AlgIIIfinish, and AlgIVfinish, there are several places inside AlgV which would like to see this still on the stack: it simplifies retrieving the middle digits from the virtual stack to check if they are \(0\), and when AlgV knows it won't return to AlgMain, it can use the value for its adjustments.

With this general preparation out of the way, it's time to explain the stack layout during the loop. More specifically, at the start of step \(i\) it looks like this:
- Buried the deepest are the already-generated digits of the first palindrome, i.e. \(x_0\) to \(x_{i-1}\).
- Next in order are the digits of the second palindrome, \(y_1\) to \(y_{i-1}\).
- The easiest-to-access partial palindrome is \(z_1\) to \(z_{i-1}\).
- Finally, a handful of "working variables". They obviously change over the course of the loop, but at the start of the step there are three of them: the carry \(c_{i-1}\) left over from the previous iteration, a non-buried copy of \(x_{i-1}\) for easy access, and a tentative value for \(x_i\).

The first carry is generated by computeCarryAndRem alongside \(z_1\). I've explained in pickType what happens in conjunction with that - a callback massages the values if \(z_1\) is a forbidden \(0\), effectively switching from e.g. type A1 to A2, or from type B6 to B7. This takes care of the last piece of step 1 in all four main algorithms.
The \(x_{i-1}\) copy is initialized from \(x_0\). Yes, in type A that's the dummy \(0\), but only type B will actually use this working variable. For type A the "index adjustment variable" redirects accesses elsewhere.
The "tentative \(x_i\)" represents what sets step 2 apart from the looped step: the only difference is in the definition of \(x_i\), where step 2 uses \(D(\delta_k-y_1)\) and \(D(\delta_k-y_1)-1\) (I define \(k\) here as \(l-2\) or \(l-3\), whichever lines up with \(y_1\); I have that index prepared in 3GETLAM for a reason), while the looped step uses \(1\) and \(0\). (Yes, I altered one of the formulae a little. It still yields the same result because no type sets \(y_1\) to exactly \(\delta_k\), so \(D(\delta_k-y_1)\) won't be \(0\), and the \(-1\) part is safe to pull out of \(D(\text{...})\).) Do you see how step 2's \(D(\delta_k-y1)\) is switched out for \(1\) in the looped step? That's the "tentative \(x_i\)", and its first value is therefore \(D(\delta_k-y_1)\).

The loop starts with refining the tentative \(x_i\) into the final one. The comparison the proof gives for its piecewise definition of \(x_i\) involves values that also go into the definition of \(y_i\), so the calculations for these are blended together. Caught up in that blend is also a line advancing the LAM3 index, because the digit recalled with that index is one of the values used for \(x_i\) and \(y_i\).
Both of these are subsequently buried in the depths of the stack with UNROLLs. I keep a copy of both around, though, because they are still needed...
The \(y_i\) copy is used up immediately along with the previous carry \(c_{i-1}\) by adding them together. \(x_i\) goes on a somewhat more convoluted journey: type A needs it to be used up immediately as well, but type B needs it to hang around for a full loop iteration, with \(x_{i-1}\) used instead. I achieve this by placing \(x_i\) and \(x_{i-1}\) on adjacent stack levels and grabbing one or the other depending on the "index adjustment variable" LAM5. The other one stays for the next iteration. With type A that means the \(x_{i-1}\) working variable won't actually contain \(x_{i-1}\) beyond the first iteration (it'll remain the dummy \(x_0=0\)), but if it doesn't get used, who cares?
The rest of the loop is simple again: use up the chosen one out of \(x_i\) and \(x_{i-1}\), use up \(c_{i-1}+y_i\). and receive \(z_i\) and \(c_i\) from computeCarryAndRem in return. \(z_i\) gets stashed, \(c_i\) and the leftover \(x_i\) are arranged as two of the next iteration's working variables, and a constant \(1\) joins them as the tentative \(x_{i+1}\).

At the end, one of AlgIfinish, AlgIIfinish, AlgIIIfinish, and AlgIVfinish is responsible for filling in any still-missing digits in the middle of palindrome, performing the correction step, and wrapping the palindrome digits into three lists. All of these are highly specific to one of the four main algorithms, so it's fine to let their paths diverge now.

Code:

LABEL AlgMain.s

::

  3PICK 5PUTLAM

  4GETLAM 3GETLAM #-

  BINT2 #/

  {

    INCLUDE AlgIfinish

    INCLUDE AlgIIfinish

    INCLUDE AlgIVfinish

    INCLUDE AlgIIIfinish

  }

  3PICK#1+_ 7PICK #2* #+

  NTHCOMPDROP 8PUTLAM

  SWAP 5PICK #=?SKIP INCLUDE AlgV

  7PUTLAM

(compute last initial value z_1)

(and first working var:prev)

(carry,then adjust initial)

(values if z_1=0)

  3PICK #2+PICK OVER

  computeCarryAndRem

  :: BINT0 GetElemTopVStack ;

  'R OVER#0= ITE

  ::

    SWAPDROP EVAL

    ROTOVER #- SWAPROT

  ;

  DROPSWAP

(second working var:x_i-1)

  4PICK

(third working var:tentative x_i)

  3GETLAM GetElemBotVStack

  5PICK #- BINTmod

(loop)

  7GETLAM ONE_DO

(calculate y_i and adjust x_i)

(if needed)

    3GETLAM #1+DUP 3PUTLAM

    GetElemBotVStack 5PICK

    2DUP #- #1-UNROT

    #>?SKIP SWAP#1-SWAP

    INCLUDE BINTmod

(save away copies of x_i and y_i)

    OVERINDEX@ #2* #5+ UNROLL

    DUPINDEX@ #5+ UNROLL

(calculate new carry and z_i)

    4ROLL #+

    5GETLAM #2+ROLL

    computeCarryAndRem

    :: INDEX@ GetElemTopVStack ;

(save away z_i and prepare)

(working vars for next iteration)

    SWAPROT BINT1

  LOOP

  8GETLAM COLA_EVAL

;

@

Before I descend into the "finishers" for the four main algorithms, let me sweep up another subroutine interacting with pickType and AlgMain: AlgV.
Algorithm V is meant to treat deficiencies of Algorithm II and Algorithm IV (the ones dealing with even-length numbers) that relate to occurrences of \(0\) in the middle pair of input digits. It does so by decrementing both of them (with normal carry into higher digits) and making up for the difference by adding it onto the (identical) middle digits of the first palindrome. As explained in pickType, there's a safety net which keeps the subtraction from making the input number drop into the range where the first palindrome no longer has a pair of central digits.
The proof gives special treatment to numbers where subtracting \(1\) from both central digits resolves the problem for one but merely shifts it over to the other. The special treatment amounts to applying AlgV twice, so I don't explicitly handle it, and just rely on a retry via pickType knocking at the AlgV door again. It's good to know that this recursion will fizzle out by itself after at most two retries (especially for the proof's correctness), but it doesn't lead to any visible consequences in an implementation.

What is visible, however, is that AlgV's checks aren't concluded after comparing the middle digits with \(0\). The other check compares the length of the input number with \(6\) - because while AlgS6 sends 6-digit numbers starting with anything other than \(1\) into the long-number code, one of its addenda to Algorithm II states that Algorithm V shall be suppressed.

After the checks go through, AlgV cleans up the stack. Only the index of the less significant one out of the middle digits is left standing, the other three items on the data stack have to go. The return stack is cleaned up at the end, right before the retry in the form of a pickType call: 2RDROP cancels the rest of AlgMain and the runstream pickType uses to pass a callback to AlgMain. The next runstream on the return stack points into fromZINT - or right at the end of AlgV, because a retry accumulates return stack levels. I could fix that via tail-recursion with COLA, but that costs an additional command, and the extra return stack levels don't hurt anybody since the recursion is tightly bounded.

Between these cleanup pieces the action of AlgV happens. The main objective is to decrement two adjacent digits on the virtual stack by \(1\) and propagate any resulting carry. I implemented it as "subtraction by hand", with the decrement of the more significant digit sneakily grafted onto the carry propagation (I increment the first carry). Another option would have been to dump the entire virtual stack onto the data stack, split it below the middle digits, use something like subAsZINTthenAlgS (without the AlgS part, somehow...) to subtract \(g+1\) from the upper part, then glue the parts back together and make a new virtual stack from them. I found that not worthwhile when I attempted it, but maybe I just need to give it another go. It would have different consequences on a leading \(0\) for sure, thanks to the round-trip through ZINT representation. As is, the "subtraction by hand" is moderately bulky but otherwise uninteresting - I keep the current carry and the virtual stack index it comes from on the data stack until the carry is \(0\).

Beyond the carry propagation loop I change an offset stored in a local variable. I'm punting the responsibility for adding it onto the middle digits of the first palindrome down to AlgIIfinish and AlgIVfinish - that's more efficient than intercepting the palindromes returned by the recursion into pickType and performing surgery on one of them.
This offset is initialized in fromZINT as a neutral \(0\), but I have to make sure two nested AlgV invocations can cooperate, so I increment it by \(1\) instead of blindly setting it to some constant value.

Finally, I have to handle the rare case of carry propagation reaching the most significant digit and bringing it down to \(0\). As mentioned in pickType and AlgMain, I handle it by leaving this leading \(0\) in the virtual stack and ensuring that LAM3 is \(2\) when it happens (and \(1\) otherwise). A necessary precondition is that the input number was type A5 or A6 which (along with type B) get LAM3 set to \(2\) in the middle of pickType, so all I have to do is reset it to \(1\) for all the other type A5+A6 and B numbers. The length of the input number stays untouched, AlgMain relies on that to work properly, as noted there.

pickType expects a meta containing the input number's digits as parameter on the data stack, but all it does with it is to NDROP it all. I'm lazy and pass the retrying pickType an empty meta instead, represented by the recycled \(0\) carry left over from the manual carry propagation loop.

Code:

LABEL AlgV.s

::

(AlgV checks)

  DUP#1+ GetElemTopVStack

  OVER GetElemTopVStack

  2#0=OR NOT?SEMI

  4GETLAM #6- #0=?SEMI

(AlgV)

  4UNROLL3DROP

(subtract from less significant)

(middle digit)

  DUP GetElemTopVStack

  DUP#0=ITE

  :: DROP BINT2 1GETLAM #1- ;

  #1-1SWAP

  3PICK PutElemTopVStack

(propagate carry, which is)

(1 or 2 in the first iteration)

(to also subtract from more)

(significant middle digit)

  BEGIN

    SWAP#1+ SWAPOVER

    GetElemTopVStack SWAP#-

    1GETLAM OVER#< ITE

    :: 1GETLAM #+ ONESWAP ;

    ZEROSWAP

    3PICK PutElemTopVStack

  #0=UNTIL

(increase stored offset and)

(try again, applying it)

(is done at the end of)

(AlgIIfinish and AlgIVfinish)

  6GETLAM #1+ 6PUTLAM

  SWAPDROPDUP GetElemBotVStack

  #0=?SKIP :: BINT1 3PUTLAM ;

  2RDROP pickType

;

@

It's almost time for the algorithm finishers ... but for them we need to see a pair of helper functions. Their contents aren't hard to understand, but I'll take this opportunity to point out: until the very end of the finishers, the palindromes are represented as a series of separate digits on the stack - half a palindrome's worth for each, since maintaining the duplicates making up other half across all changes would be quite burdensome. The knowledge whether a palindrome should have one or two middle digits is implicit up to this point, but after the finishers are done and all the code paths converge again, it's lost - I have to apply it to the palindromes in time, which I do by picking which helper gets called.

Code:

LABEL makeEvenLenPal.s

::

  dup_

  reversym top& P{}N

;

@

Code:

LABEL makeOddLenPal.s

::

  dup_ SWAPDROP#1-_

  reversym top& P{}N

;

@

Time for a finisher - this is the one for Algorithm I, the simplest one. All four finishers have to take the digit groups generated by the loop in AlgMain as well as its leftover "working variables" and package them into proper palindromes.
Wrapping them in lists at the end is straightforward with the two helper functions shown above (though have a look at how I can temporarily park already-finished palindromes on the return stack - UNROLLing them out of the way instead would have been less efficient due to the variable meta length). Strictly speaking, the digits weren't stored as a proper meta because their length field was missing; that's fixed as part of packaging because the length of each palindrome can be derived from the number of iterations done, which has been stored into LAM7 by AlgMain. Keeping a count up-to-date throughout the main loop would have been much more work (ROLL each of them down, increment, and UNROLL back), and that's not even counting how much they would have gotten in the way of the other operations in the loop.
The more interesting part of the finishers is filling in any missing digits (Algorithm I has just one, \(x_{m+1}=0\)), and the correction step with its nested case structure. Generating the missing digits is weaved into the correction step's code because as central digits in the palindromes they are a prime target for corrections. That also means UNROLLing them into their proper places on the stack occurs after all the nested cases.

Algorithm I has just three cases. Due to a major issue in version 2 of the proof (an optimization on the authors' side works for 7-digit and longer numbers, but it breaks the assertion that numbers with only 5 digits starting with any digit other than \(1\) can use Algorithm I without changes), I'm falling back to version 1. That means case I.3 has two sub-cases. If you're interested in the mathematical background of this issue, it's a consequence of 5-digit numbers using only step 2, with no iterations of the proof's looped step. In my AlgMain implementation that means a single iteration with a "tentative \(x_i\)" that may not be \(1\) like in subsequent iterations. The last "tentative \(x_i\)" is what decides which of the sub-cases of I.3 is needed; values other than \(1\) lead to the I.3.ii case that got optimized away.

The reasoning for version 2's optimization is nonsense, too. It talks about \(\delta_{m-1}=0\) not being allowed, which appears to be erroneously copied from Algorithm II where that is a result of Algorithm V adjusting the input number to make it so. However, it still (mostly) holds: \(y_m=g-1\) and \(x_m=1\) can't occur together because \(y_m=g-1\) would imply that \(\delta_m=z_{m-1}\), which clearly leads to the second branch of the definition of \(x_m\), i.e. \(0\). Unless \(m=2\), that is - then \(x_m\) can have higher values due to the "tentative \(x_2\)" not necessarily being \(1\). That's the core of the 5-digit logic hole made obvious - \(m=2\) corresponds to a 5-digit number.

Annoyingly, I left a little optimization potential behind. As irritated as I am about that, it also presents an opportunity to give you some insight into my methods for arriving at optimized code. They may not be the best for you, but they work for me.

First of all, this part of the code has been completely rewritten when the 5-digit issue was discovered. I still have some memories of the previous version, and an important piece was that cases I.2 and I.3 (in its buggy form) used the same value for the filled-in digit, i.e. \(x_{m+1}=1\). That led to a nesting structure that separated I.1 from the other two cases first, set the value of \(x_{m+1}\) accordingly, and then separating I.2 and I.3.
With I.3.i from version 1 on the table, that's no longer an appropriate branching structure, because it has \(x_{m+1}\=0) like I.1. Invalidating such a deeply baked-in decision is a reason for a rewrite - I'm not afraid of rewrites, attempting to fix already optimized code instead (to work with a shifted foundation) often leads to a mess - the perceived benefit of continuity merely keeps hard-to-read (previously) optimized code around and buries it in more layers of hard-to-read code, and usually it won't even be fully optimized anymore because the new layer has to work around the kinks of the old code. If you're concerned about maintenance, embrace rewriting it to only have that one layer of optimized code, or don't optimize this aggressively in the first place.

My next attempt lumped I.1 with I.2, and separated I.3 first. I decided to deviate from the proof's instructions and initialized \(x_{m+1}\) to \(1\), then overwrite it with \(0\) in cases I.1 and I.3.i. That's reversed from the proof; additional options were: delaying initialization such that the cases push a new value to the stack instead of replacing one, or pushing both values to the stack and utilizing SysRPL combo commands to DROP one or the other. I counted commands for all four options and found this to be the best one. The more thoroughly you explore your options, the better will you be able to find the optimal one.

However, afterwards I spotted something else: the differences between both sub-cases of I.3 are the new values for \(x_{m+1}\) and \(z_m\), and taken together as a 2-digit number these always have the value \(z_m+1\), i.e. I.3.ii is a kind of "error" case where the increment I.3.i prescribes for \(z_m\) goes out of range. That triggered a rewrite of that branch, such that I don't even explicitly distinguish between the sub-cases anymore, and just obtain the new values from 1GETLAM #/ (quotient into \(x_{m+1}\), remainder into \(z_m\)). That requires unconditionally overwriting \(x_{m+1}\), and I failed to realize that this impacts my previous choice for this digit - only I.1 and I.2 are interested in that choice now, which means pushing both values initially and discarding one or the other is one command shorter than the code below.
With that revised choice I.3 needs to discard both, i.e. one more than it currently does, but that merely changes the 2DROP into 3DROP. The code for the I.1 and I.2 branch doesn't change length either, #0= ?SWAPDROP is still two commands inside that secondary. But getting the two values in the appropriate places on the stack is shorter by one command because I recycle the "working variables" from AlgMain's loop: \(c_m\) needs to be brought from level 3 to level 1 because it's used for the decisions, and the other two are always \(0\) (the dummy value where type B numbers would keep a copy of \(x_{i-1}\) for use in the next iteration; Algorithm I deals with type A numbers instead) and \(1\) (the next "tentative \(x_i\)"). SWAPDROPSWAP therefore needs to be replaced with ROT, which is free because combines with the DUP of the \(c_m=2\) check into ROTDUP.

This is a situation I caused myself by optimizing the handling of \(x_{m+1}\) before the I.3 branch had its formulae set in stone, essentially a case of the "premature optimization" boogeyman (if you ignore that changing the formulae was itself an act of optimization). I subscribe to the belief that optimization can't be premature - after all, choosing the right algorithm for a job is "optimization" as well, and you do that very early on. If your assignment is to sort values (and you are not allowed to use library functions for some reason), you'll look for sorting algorithms ... Bubblesort? Nah, it's terrible for anything beyond basic computer science education with blackboard and chalk. Quicksort? Maybe. Mergesort? Also maybe. Perhaps even pluggable algorithms, using some kind of callback. The point is that you necessarily pick an option before you write the code for it. If you get it right immediately, you don't need to throw away code you've spent time and effort to write, and you don't need to recall or reconstruct mental context for additions and replacement.
And don't get me started on optimizing for developer time, which is definitly an optimization target too. The "we can optimize it later" mindset that's especially prevalent in modern scripting languages' communities but also present in compiled ones leads exactly to this repeated revisiting. Each time you think "oh, what was this for again, let me read the docs I wrote back then, ahh crap they don't exist", you're using up the time you gained from your earlier laziness, and then some. Get it right on the first try (or maybe second), and you'll likely save time overall.

Well, that turned into a rant - apologies for that. Back to the SysRPL code. The only thing left to point out is that type A numbers have a dummy \(0\) beyond the palindromes. That gets kicked with the SWAPDROP near the end of the packaging section.

Code:

LABEL AlgIfinish.s

::

  SWAPDROPSWAP

  DUP #2= ITE

  ::

(I.3)

    2DROP

    7GETLAM DUP#2+PICK_

    #1-SWAP #1+UNPICK_

    #1+ 1GETLAM #/

  ;

  ::

(I.2)

    #0=?SEMI

(I.1)

    DROPZERO

  ;

  7GETLAM #2* #1+UNROLL

  7GETLAM INCLUDE makeOddLenPal

  OBJ>R_

  7GETLAM INCLUDE makeEvenLenPal

  OBJ>R_

  7GETLAM #1+ makeOddLenPal

  SWAPDROP R>OBJ_ R>OBJ_

;

@

Algorithm II has much more cases and sub-cases in its correction step than Algorithm I. It also defines two additional digits instead of just one - and in a small oversight, the proof skips the explicit declaration of \(c_m\). Part of these traits happen to be shared with Algorithm III, especially the matter of calculating another carry and remainder (which goes into \(y_m\) for both). I spotted this code sharing potential by thinking "wait, didn't I just write something very similar?" That's clearly not a universally applicable method for optimization, but it shows that a good memory and/or very thorough examination of the algorithm you're implementing can occasionally turn up code sharing opportunities in places you didn't fully expect.

Again, there's optimization potential left. It's even easier to spot than in Algorithm I, if you remember that the middle one out of AlgMain's "working variables" will have a value of \(0\) for type A input numbers: replace 2DROP ZEROSWAP with DROPSWAP. I must've been half asleep when I wrote this garbage - no lessons here other than: don't do that, get your sleep! Big Grin

Inside the already deeply nested case II.2.ii.c, the addendum for 6-digit numbers is embedded with even more impressive indents. The S6.not1 prefix in my case-referencing comments make it easy to find. It consists of an unnamed general case (\(x_2\ne 0\)) and four named ones, though out of these iv is merely theoretical. Furthermore, i is a special case of iii, so I handle them together - the conditions \(x_2=0\) and \(y_1=g-1\) suffice.
They are also the only ones to deviate from the palindrome lengths normally prescribed by Algorithm II. Since all of the levels of nesting are entered via the "case" command, they don't leave a return stack level, so I can skip the normal packaging for Algorithm II with a simple RDROP (combined with DROP here) and proceed with packaging in the style of the "short / small" numbers, i.e. with fixed-length palindromes constructed explicitly in full length (which is more efficient than the reversym-based helpers at these lengths). Even the dummy \(0\) found beyond \(x_1\) on the stack gets recycled in the process.

Allowing 6-digit numbers into the algorithm principally made for 8-digit and longer numbers doesn't lead to issues like in Algorithm I, even though Algorithm II also skips the looped step for 6-digit numbers and therefore is similarly susceptible to the last "tentative \(x_i\)" being something other than \(1\). Changes between version 1 and version 2 suggest the authors realized the issue this time - but only due to types A5 and A6 on 7-digit numbers being so similar to 6-digit numbers, as the new text in II.2.ii.c reveals. Interestingly, they didn't ban Algorithm V (which can shorten them into real 6-digit numbers of type A1 or A2!) or mandate an altered version of Algorithm II in that situation, which makes me question whether those addenda are even necessary anymore. I guess preventing Algorithm V from retrying with a 6-digit number starting with \(1\) is needed after all, because that would be type B, and Algorithm IV probably doesn't like numbers of that length, but since they are so rare, can they be intercepted more efficiently somehow, and Algorithm V let loose on the rest of 6-digit numbers? And can the replacement for II.2.ii.c be reduced or eliminated?
The resulting palindromes would be different, so I didn't investigate further, but it's especially telling that the changes between versions eliminate the old II.2.ii.b and split off a new one from II.2.ii.c - the case affected by both addenda for 6-digit numbers. The additional condition \(z_{m-1}=0\) for what's left under the II.2.ii.c label even makes the phrasing in the 6-digit addenda outdated: "[...] we have to account also for the possibility \(z_2=0\) in the Step II.2.ii.c. This is the temporary configuration in Step II.2.ii.c (\(c_2=0\), \(y_3=y_2=0)\) with \(z_2=0\). [...]" I'll leave that there as food for thought.

With the 6-digit-specific addendum done, there isn't much interesting code left here. The addendum is activated based on the original number length, so it's activated on the same numbers as the ban on Algorithm V. Specifically, this excludes those which turned into 6-digit numbers under the influence of Algorithm V, just in case one of them hits 6-digit special cases i or iii which don't follow the normal packaging code. It's also a plausible interpretation based on the proof's hard split between long numbers and almost-an-afterthough short / small ones - in its conclusions it avoids forward references like the plague, so relying on Algorithm V-caused 6-digit numbers always getting a 6-digit palindrome seems a bit of a stretch when the 6-digit cases are listed many pages later.
Speaking of Algorithm V, concealed in that normal packaging code there are the commands 6GETLAM #+ which apply the Algorithm V adjustment to the first palindrome's middle digit pair.

Code:

LABEL AlgIIfinish.s

::

  2DROP ZEROSWAP

  INCLUDE AlgIIandIIIcarry

(II.1)

  DUP#1= ITE_DROP

  ::

    #0=case

    ::

(II.2)

      SWAP#1+SWAP

(II.2.i)

      DUP#0= NOTcase #1-

(II.2.ii)

      DROP 7GETLAM #2+ROLL

      DUP#0=csedrp

      ::

        OVER #0=case

        ::

          4GETLAM #6- #0=case

          ::

(S6.not1)

            5ROLL DUP#0=csedrp

            ::

              SWAPDROP 3PICK#+

              1GETLAM #=case

              ::

(S6.not1.i and S6.not1.iii)

                DROPRDROP

                UNROT #1+SWAP

                ZEROZEROZERO

                5PICK BINT6 P{}N

                ONEONE TWO{}N

                ROT #3- ONE{}N

              ;

(S6.not1.ii)

              ROT #1-UNROT

              1GETLAM #1-UNROT

              SWAP#1+SWAP

              ZEROSWAP

              BINT1 BINT0

              1GETLAM #2-

            ;

            #1- 5UNROLL

            DUP4UNROLL

            1GETLAM #1-SWAP

          ;

(II.2.ii.c)

          BINT2 ROTDROPSWAP

          1GETLAM #1- 4UNROLL

          6ROLL #1- 6UNROLL

          1GETLAM #4-

        ;

(II.2.ii.b)

        SWAP#1-SWAP

        7GETLAM #1+UNROLL

        #ZERO#ONE

      ;

(II.2.ii.a)

      #1- 7GETLAM #2+UNROLL

      SWAP#1+SWAP

      1GETLAM #2-

    ;

(II.3)

    DROPSWAP #1-

    4ROLL #1- 4UNROLL

    ONESWAP

  ;

  7GETLAM #2+UNROLL

  7GETLAM #2* #2+UNROLL

  7GETLAM makeEvenLenPal OBJ>R_

  7GETLAM #1+ makeOddLenPal

  OBJ>R_ 6GETLAM #+

  7GETLAM #1+ makeEvenLenPal

  SWAPDROP R>OBJ_ R>OBJ_

;

@

Code:

LABEL AlgIIandIIIcarry.s

::

  3PICK computeCarryAndRem

  :: 7GETLAM GetElemTopVStack ;

  SWAP

;

@

As an algorithm for type B numbers Algorithm III actually uses the \(x_{i-1}\) "working variable" (unlike Algorithm I and Algorithm II). That is quite useful, because the formula for the extra digit \(y_m\) and the carry \(c_m\) needs it. Thus it goes into the parameter for the preceding snippet shared with Algorithm II. It's also altered in III.3.iii and III.3.iv, but attempting to keep it around (to save the commands for fetching the copy out of the first palindrome's digit group) is not worth the commands spent on DROPping it in the other cases and working around the additional occupied stack level.

Type B also means no more dummy \(0\) behind the palindromes. The \(1\) taking its place gets incorporated into the longest palindrome as \(x_0\).

In the four sub-cases of III.3 we have the unusual situation of two conditions for distinguishing them which independently control separate pieces of the adjustments: \(y_{m-1}=0\) pairs III.3.i with III.3.ii and III.3.iii with III.3.iv in adjusting \(y_{m-1}\) itself and \(x_{m-1}\); separately \(z_{m-1}\) pairs III.3.i with III.3.iii and III.3.ii with III.3.iv in adjusting \(z_{m-1}\) itself and both of the digits filled in after the loop, i.e. \(x_m\) and \(y_m\). Consequently I implemented them as two sequential ITE checks. Because I flipped the initial value of \(x_m\) compared to the proof, the latter gets downgraded into an IT, omitting the else-branch to save a command (and the code to handle III.1 and III.2 gets more or less swapped, but that's a neutral change as far as my optimization targets are concerned).
Thinking of these four sub-cases as a 2x2 grid may help - there are digit adjustments common across a row, and other digit adjustments common across a column.

Code:

LABEL AlgIIIfinish.s

::

  UNROT #+

  AlgIIandIIIcarry

(III.2)

  DUP#0= ITE_DROP

  ::

(III.1)

    #1=case SWAP#1-SWAP

(III.3)

    ROT#1+

    1GETLAM OVER#= ITE

(III.3.ii and III.3.iv)

    DROPZERO

    ::

(III.3.i and III.3.iii)

      SWAP#1- SWAPROT

      #1-UNROT

    ;

    UNROT

    7GETLAM #3+ ROLL

    DUP#0=IT

    ::

(III.3.iii AND III.3.iv)

      DROP

      7GETLAM #2* #1+DUP

      #2+ROLL #1-SWAP #1+UNROLL

      1GETLAM

    ;

    #1-

    7GETLAM #3+ UNROLL

  ;

  7GETLAM #2+UNROLL

  7GETLAM #2* #2+UNROLL

  7GETLAM makeEvenLenPal OBJ>R_

  7GETLAM #1+ makeOddLenPal

  OBJ>R_

  7GETLAM #2+ makeOddLenPal

  R>OBJ_ R>OBJ_

;

@

Almost through - here comes the finisher for Algorithm IV. The proof fills in values \(x_{m-1}\), \(y_{m-1}\), and \(z_{m-1}\). Like the preceding two algorithms it also uses \(c_{m-1}\) without defining it, but the formula for a run-of-the-mill carry isn't hard to guess. With that little pothole patched, these four values amount to an additional iteration of the main loop, which AlgMain has already taken care of.

The correction step consists of many more cases than the other three main algorithms, and its first layer of dispatching uses not just \(c_{m-1}\) but also \(x_{m-1}\). The proof even phrases it as dispatching on the sum of these two values, but then it proceeds to look into the summands in order to tell apart IV.4 and IV.5, so I broke it down into two layers dispatching on \(x_{m-1}\) with an ITE, and then inside its two branches "case" variants checking the three possibilites for \(x_{m-1}\). This duplicates case IV.1 because it's now split over both branches of the ITE with one case on each side, but that's no big deal since it's the do-nothing case. That leaves two cases per ITE side, with IV.2+IV.3 and IV.4 on the \(x_{m-1}=0\) side, as well as IV.5 and IV.6 on the other.
Conveniently, \(c_{m-1}\) and \(x_{m-1}\) are present on the stack in the form of AlgMain's "working variables". The third of those (the constant \(1\)) is DROPped immediately.
Since each branch knows the value of \(x_{m-1}\) thanks to dispatching on it, my adjustments to that value don't involve ROLLing it down to level 1, subjecting it to addition or subtraction, and UNROLLing it back, as is done for most adjusted variables across all four main algorithms' adjustment steps. Instead I just UNPICK a constant onto the correct stack level.

In IV.2 (let's postpone its companion IV.3 for a moment) the proof's cases separate out an easy case named IV.2.i with the condition \(z_{m-1}\ne 0\) first. The rest gets split into two cases IV.2.ii and IV.2.iii on \(y_{m-2}\ne 0\), and both of those have three sub-cases with quite similar figures. The conditions are slightly different, but that's fixable: IV.2.ii.c says that \(z_{m-2}\) can't be anything other than \(g-1\), so I can restructure the conditions such that IV.2.ii.a keeps only \(z_{m-2}\ne g-1\) and IV.2.ii.c gains \(z_{m-2}=g-1\) in addition to the \(y_{m-1}=1\) it already had. That aligns the sub-case conditions between IV.2.ii and IV.2.iii, and the figures inside are also eerily similar - the only difference between them are how \(y_{m-2}\) and \(x_{m-2}\) are affected. That means they all make a 2x3 grid of cases, so I can separate the choice between IV.2.ii and IV.2.iii from the sub-case choice like I did in Algorithm III. One part of this is even moved to the separate file AlgIVfinish2and4ii, because the adjustments that are different between IV.2.ii and IV.2.iii and common across their respective sub-cases are the same as the adjustments that are different between IV.4.ii.a and IV.4.ii.b, including the condition to decide between them.
Remodeling the conditions of IV.2.ii's sub-cases also flips the suggested nesting order of checks: \(z_{m-2}=g-1\) first, then \(y_{m-1}=1\). I remodeled this case's conditions instead of the other one's because this order lets me share the code for common adjustments to \(z_{m-2}\) and \(z_{m-1}\) in sub-cases .b and .c.

IV.3 has a condition which suggests splitting it away from IV.2 early, even before IV.2.i. However, the conclusions reached in IV.3, notably \(z_{m-1}=0\), \(y_{m-2}\ne 0\), and \(z_{m-2}\ne g-1\), let me trace a path through IV.2's sub-cases into IV.2.ii.a. Indeed, a close look shows that IV.3 is a special case of IV.2.ii.a with a distractingly complicated description, so I don't need extra code for it at all.

IV.4 has an easy case in IV.4.i to be separated early and a somewhat more complex IV.4.ii which has enough similiarity with IV.2 that the code written for the differences between IV.2.ii and IV.2.iii also works for IV.4.ii.a and IV.4.ii.b. As already mentioned, that part is shared via a separate library command - the two places which need to share this snippet are nested too deeply to receive any advantages from attempting to share via runstream manipulation like in pickType.
Finally, there is IV.4.iii which has some superficial similarity with the sub-cases of IV.2.ii, but in the details it's too different to share code with those. In the end the amount of code sharing I was able to achieve was limited to common pieces of the unnamed sub-cases in IV.4.iii.a and similarly the common pieces of the unnamed sub-cases in IV.4.iii.b. The rest of the adjustments had to be done with a separate set of commands.

On the \(x_{m-1}=1\) side, IV.6 is thankfully short and simple, with no sub-cases. IV.5 compensates for that, hitting the proof's reader with another large set of them. They aren't so bad though, because they break down into just three groups.
IV.5.i and IV.5.ii clearly belong together with a shared \(z_{m-1}\ne g-1\) condition and similar adjustments to the palindrome digits. The latter is nothing more than an "error case" for the former, preventing an out-of-range \(y_{m-1}\) with a simple extra adjustment.
IV.5.iii.a has to fend for itself. Oh well, it's just a single case.
The remaining nine cases can be arranged into a 3x3 grid. It can even be reduced to a 2x3 grid by spotting that the difference between sub-cases .b and .c (or .c and .d instead for IV.5.iii) amounts to a single digit \(y_{m-1}\) set to a different value, and the condition to select between them checks if that digit has a certain value - so they can be merged with a formula in place of constants for that digit. Then, it can be reduced further to a 2x2 grid by merging IV.5.iii (or what's left of it after IV.5.iii.a has been thrown out) with IV.5.iv: the differences are the adjustments for \(y_{m-2}\) and \(z_{m-2}\), which both are a constant in one and variable in the other, but transferring the variable formula to the constant case achieves the correct result.

There may be further optimizations possible, but they are not as trivial as those I showed in the examination of the other algorithms' correction steps. Ideas I have include pulling a bunch of digits touched by many of the cases out of the digit groups down to fixed spots on the stack ahead of time (deduplicating all that 7GETLAM stuff), and merging \(x_{m-1}\) with \(y_{m-1}\) into a two-digit number for the duration of the correction step (they seem to be frequently adjusted as if they were just that). For the other algorithms, neither appears useful (apart from I.3 which essentially does have the two-digit number trick already), but here the wide decision tree of cases may push them over the edge.

Code:

LABEL AlgIVfinish.s

::

  DROP #0=ITE

  ::

(IV.1)

    DUP#1= caseDROP

    #0=case

    ::

(IV.2 and IV.3)

      DUP#0=case

      ::

        INCLUDE AlgIVfinish2and4ii

        SWAP#1+

        1GETLAM OVER#= ITE

        ::

(IV.2.ii.b,IV.2.ii.c,)

(IV.2.iii.b,and IV.2.iii.c)

          DROP BINT3 BINT1

          7GETLAM #2+ROLL

          DUP#1= ITE

(IV.2.ii.b and IV.2.iii.b)

          :: 1GETLAM #+ ;

(IV.2.ii.c and IV.2.iii.c)

          SWAP#1+SWAP

          #2- 7GETLAM #2+UNROLL

        ;

        ::

(IV.2.ii.a,IV.2.iii.a,and IV.3)

          SWAPDROP

          7GETLAM ROLL #1-

          7GETLAM UNROLL

          ONEONE

        ;

        7GETLAM #2* #1+UNPICK_

      ;

(IV.2.i)

      #1-

      7GETLAM #1+ROLL #1+

      7GETLAM #1+UNROLL

    ;

(IV.4)

    #1+ 7GETLAM #1+ROLL

    1GETLAM 3PICK #<>case

(IV.4.i)

    :: #1- 7GETLAM #1+UNROLL ;

    ROT#1+ ROT OVER#=case

    ::

(IV.4.iii)

      #2- SWAP2DUP #>case

      ::

(IV.4.iii.a)

        #2+ 7GETLAM UNROLL

        DUPDUP

        7GETLAM #2* #1+UNPICK_

        7GETLAM #2+ROLL

        #1+DUP 1GETLAM #=ITE

        DROPZERO

        ::

          7GETLAM #2* #1+DUP

          #2+ROLL #1-SWAP

          #1+UNROLL

        ;

        7GETLAM #2+UNROLL

      ;

(IV.4.iii.b)

      #3- 7GETLAM UNROLL

      DROPZERO BINT2

      7GETLAM #2* UNPICK_

      7GETLAM #1+ROLL DUP#0=IT

      ::

        DROP

        7GETLAM #2* DUP#1+PICK

        #1-SWAP UNPICK_

        1GETLAM

      ;

      #1- 7GETLAM #1+UNROLL

      BINT3

    ;

(IV.4.ii)

    SWAP #2- 7GETLAM UNROLL

    ONEONE 7GETLAM #2* #1+UNPICK_

    AlgIVfinish2and4ii

  ;

  ::

(IV.1)

    DUP#0=csDROP

    #1=case

    ::

(IV.5)

      #1+ 1GETLAM OVER#=case

      ::

(IV.5.iii,IV.5.iv,and IV.5.v)

        7GETLAM #2+ROLL #1+SWAP

        OVER#=

        3PICK #0= ORNOT case

        ::

(IV.5.iii.a)

          7GETLAM #1+UNROLL

          7GETLAM ROLL #1+

          7GETLAM UNROLL

          BINT0

          7GETLAM #2* UNPICK_

          #1- 1GETLAM #2-

        ;

        DUP#1= IT

        ::

(IV.5.v)

          1GETLAM #+

          7GETLAM #2* #1+DUP

          #1+ROLL #1-SWAP UNROLL

        ;

        #2- 7GETLAM #1+UNROLL

        7GETLAM ROLL DUP #>1 ITE

        ::

(IV.5.iii.b,IV.5.iv.a,)

(and IV.5.v.a)

          BINT2

          7GETLAM #2* UNPICK_

        ;

(IV.5.iii.c,IV.5.iii.d,)

(IV.5.iv.b,IV.5.iv.c,)

(IV.5.v.b,and IV.5.v.c)

        :: 1GETLAM #+ ;

        #2- 7GETLAM UNROLL

        #1+ BINT1

      ;

(IV.5.i and IV.5.ii)

      7GETLAM #1+ROLL

      DUP#0=IT

      ::

(IV.5.ii)

        7GETLAM #2* UNPICK_

        1GETLAM

      ;

      #1- 7GETLAM #1+UNROLL

    ;

(IV.6)

    7GETLAM #1+ROLL #1-

    7GETLAM #1+UNROLL

    DROPZERO

  ;

  7GETLAM makeOddLenPal OBJ>R_

  7GETLAM makeEvenLenPal OBJ>R_

  6GETLAM #+

  7GETLAM #1+ makeEvenLenPal

  R>OBJ_ R>OBJ_

;

@

Code:

LABEL AlgIVfinish2and4ii.s

::

(IV.2 and IV.4.ii)

  7GETLAM #2+ROLL DUP#0=IT

  ::

(IV.2.iii and IV.4.ii.b)

    DROP

    7GETLAM #2* #1+DUP

    #1+ROLL #1-SWAP UNROLL

    1GETLAM

  ;

  #1- 7GETLAM #2+UNROLL

;

@

That was the final source file. Phew! Any questions?